Introduction to SAP ABAP internal tables
In this SAP Press book chapter download, find an overview of SAP ABAP internal tables. Learn about the different ABAP table types available.
In this SAP Press book chapter download, find an overview of SAP ABAP internal tables. Learn about the different ABAP table types available.
7 Processing of Internal Tables
Internal tables are among the most complex data objects available in the ABAP environment. The use of internal tables lets you store dynamic datasets in the main memory. Internal tables are comparable to arrays and they spare the programmer the effort of program-controlled memory management thanks to their dynamic nature. The data in internal tables is managed per row, whereas each row has the same structure.
In most cases, internal tables are used for the buffering or formatting of contents from database tables. The type of access to internal tables plays an important role for performance, as is the case with database tables. Experience shows that the tuning of internal tables enables similarly major effects as the tuning of database accesses. The negative effects of inefficient accesses to internal tables for the overall system can be compensated more easily than inefficient database accesses by adding further CPUs or application servers. Inefficient database accesses affect the database as a central resource, whereas inefficient accesses to internal tables impact the better scalable application layer (see Chapter 2).
This excerpt from ABAP Performance Tuning by Hermann Gahm is reprinted here with permission from SAP Press, copyright 2011. Download a PDF of this chapter.
The following sections first provide a general overview of the internal tables. This is followed by a description of how the internal tables are organized in the main memory. The subsequent section discusses the different types of internal tables. The major part of this chapter then details the performance aspects for the processing of internal tables. Typical problematic examples and solution options are presented here.
7.1 Overview of Internal Tables
Internal tables are completely specified by four properties:
1. Table type The access type to the table type determines how ABAP accesses the individual table rows. Section 7.3, Table Types, discusses this topic in great detail.
2. Row type The row type of an internal table can be any ABAP data type.
3. Uniqueness of the key The key can be specified as unique or non-unique. In case of unique keys, there are no multiple entries (regarding the key) in the internal tables. The uniqueness is based on the table type. Standard tables only allow for non-unique keys and hashed tables only for unique keys.
4. Key components (taking the sequence into account) The key components and their sequence specify the criteria based on which the table rows are identified.
Figure 7.1 illustrates this syntactically.
Figure 7.1 Internal Tables — Declaration
The combination of access type and table type is mainly relevant for the performance. Section 7.3, Table Types, discusses the various access types and table types.
Before describing the table types in detail, let’s first discuss the organization of internal tables in the main memory.
7.2 Organization in the Main Memory
In the main memory, the internal tables, just like the database tables, are organized in blocks or pages. In the context of internal tables, the following sections use the term pages.
When an internal table is declared in an ABAP program, the system only creates a reference (table reference) in the main memory initially. Only when entries are written to the table does the system create a table header and a table body. Figure 7.2 shows a schematic diagram of the organization in the main memory.
Figure 7.2 Schematic Diagram of the Organization of the Internal Tables in the Main Memory
The table header has a reference to the first page of the table body and another reference to page management. Page management manages the addresses of the pages in the main memory.
The table reference currently occupies 8 bytes of memory space. The table header occupies about 100 bytes of memory space depending on the platform. The space required for page management depends on the number of pages.
The table body consists of pages that can include the table rows. The first two pages are — depending on the row length and other factors — usually smaller than the pages 3 to n (if the row lengths are not so long that the maximum page size is reached already at the beginning).
As of the third page, the pages are created with the maximum page size, which is usually between 8 KB and 16 KB. This depends on the length of the row. Unlike database tables, the access is not per page but per row. So if you access a row of an internal table, the system reads only one row. The effort for searching table entries (or data records) is comparable to the database tables. For this purpose, the index or hash administration provides support for the internal tables. You learn more about internal tables in Section 7.3, Table Types, for the table types because they are directly related to this topic.
The table header includes the most important information about an internal table. For example, you can quickly query the number of rows using DESCRIBE TABLE <itab> LINES <lines> or the integrated function, LINES( itab ), from the table header.
As very small internal tables with only a few rows can result in wastage due to the memory use of the automatically calculated first page, INITIAL SIZE is added for the declaration of internal tables. It can provide information on the size of the first page, so a smaller memory allocation than in the standard case occurs.
However, if considerably more rows are required than originally specified for INITIAL SIZE, the third page is created faster with the maximum page size. For example, if 4 was specified for INITIAL SIZE, the third page may already be required as of the 13th row if the second page is twice as large as the first page. Relatively few rows (13, for example) require relatively much memory (three pages, third page with a size of 8 to 16 KB), whereas one page would have been sufficient if a higher value (for example, 14) had been specified for INITIAL SIZE. Consequently, for small tables it is important that INITIAL SIZE is not selected too small. Select a value that provides sufficient space in the first (or first and second) page for most cases.
INITIAL SIZE should always be specified if you require only a few rows and the internal table exists frequently. For nested tables, if an internal table is part of a row of another internal table, this is likely for the inner internal table. It can also occur for attributes of a class if there are many instances of this class.
Caution: INITIAL SIZE and APPEND SORTED BY
In conjunction with the APPEND wa SORTED BY comp command, the INITIAL SIZE addition not only has a syntactic but also a semantic meaning (see documentation). However, don’t use the APPEND wa SORTED BY comp command; instead, work with the SORT command.
Depending on the table type or type of processing, you also require a management for the access to the row, that is, an index for the index tables and a hash administration for the hashed tables. At this point, memory may be required for the management of entries in addition to the pages. This management also occupies memory. Both in the Debugger and in the Memory Inspector, this memory is added to the table body and not displayed separately. Compared to the user data, this management can generally be neglected.
But how can you release allocated space in the internal tables again? The deletion of individual or multiple rows from the internal table using the DELETE itab command doesn’t result in any memory release. The rows concerned are only “selected” as deleted and not deleted from the pages.
Only when you use the REFRESH or CLEAR statements the system does release the pages of the internal tables again. Only the header and a small memory area remain.
In this context, released means that the occupied memory can be reused. As the memory allocation from the Extended Memory (EM) for a user is usually done in blocks (see Section 6.1 in Chapter 6), which are considerably larger than the pages of an internal table, this is referred to as a two-level release. Release initially means that the pages within an EM block are released and this space can then be reused by the same user. Only if the EM block is completely empty and doesn’t contain any data (variables, and so on) of the user any longer is this block returned to the SAP memory management and available for the other users again.
The FREE itab ABAP statement, however, results in the complete de-allocation of the table body, that is, all pages and the index (if available) of the internal tables are released. Additionally, the table header is added to a system-internal “free list” for reuse.
If an internal table should be reused, it is advisable to use REFRESH or CLEAR instead of FREE because this way the creation of the first page can be omitted. If a large part of the rows of an internal table was deleted using DELETE and the occupied memory should be released, it is recommended to copy the table rows. A simple copy to another internal table is not sufficient because of table sharing, which is discussed in Section 7.4, Performance Aspects. Alternatively, you can revert to ABAP statements (INSERT or APPEND) or to the EXPORT/IMPORT variants (see Section 6.2.2 in Chapter 6) for copying. In the context of performance, the “release” of memory only plays a secondary role (as long as no memory bottleneck exists in the system). In contrast to fragmented database tables, fragmented internal tables have no negative effects on the performance because the entries can always be addressed efficiently because internal tables are always managed per row.
Background: Difference between Internal Tables and Database Tables
Internal tables can be compared to database tables in many respects, but there is one major difference:
Internal tables are always processed on a row basis, whereas database tables are always processed on a set basis. A set-based processing, possible with Open SQL on database tables, is not possible on internal tables because the single row is the main processing criterion for internal tables, whereas a set of data records is the main processing criterion for database tables. Set-based accesses to internal tables, for instance, LOOP ... WHERE or DELETE ... WHERE, are emulated by the ABAP VM and can be mapped in an optimized way for some table types (see Section 7.4, Performance Aspects). More complex, set-based operators, such as joins and aggregates,… are not possible on internal tables. They must be programmed using the existing ABAP language techniques.
After you’ve learned about the organization of internal tables in the main memory, the next section focuses on the organization of internal tables and discusses the different types of internal tables.
7.3 Table Types
Internal tables can be subdivided into index tables and hashed tables. The index tables, in turn, can be divided into standard tables and sorted tables. Figure 7.3 shows an overview of the table types.
Figure 7.3 Overview of the Table Types
The table type specifies how you can access individual table rows via ABAP.
For standard tables, the access can be implemented via the table index or a “key.” For a key access, the response time depends linearly on the number of table entries because the read access corresponds to a linear scan of the entries, which is canceled after the first hit. The key of a standard table is always non-unique. If no key is specified, the standard table receives the default key, which is a combination of all character-like fields.
Sorted tables are always sorted by the key. The access can be carried out via the table index or the key. For a key access, the response time depends logarithmically on the number of table entries because the read access is carried out via a binary search. The key of sorted tables can be unique or non-unique. On sorted tables, you can process partial keys (initial parts of the complete key) in an optimized manner. An over-specification of the table key is also possible; only the components of the key are used for the search and the remaining components are then utilized for the filtering.
Standard tables and sorted tables are also referred to as index tables because both tables can be accessed using the table index.
The read access to hashed tables is only possible by specifying a key. Here, the response time is constant and doesn’t depend on the number of table entries because the access is carried out via a hash algorithm. The key of hashed tables must be unique. Neither explicit nor implicit index operations are permitted on hashed tables. If a hashed table is accessed with a “key” that is different to the unique table key, the table is handled like a standard table and searched linearly according to the entries. This is also the case for a partial key. Different to the sorted table, this partial key cannot be optimized for the hashed table. Over-specified keys are processed in an optimized manner.
By means of the DESCRIBE TABLE <itab> KIND <k> statement, you can determine the current table type at runtime. Of course, this is also possible using Run Time Type Identification (RTTI).
An index or a hash administration is available for the efficient management or access optimization of internal tables. The following section describes which types are available and when they are created.
Indexes for index tables are only created when the physical sequence no longer corresponds to the logical sequence, that is, when one of the INSERT, DELETE, or SORT statements is executed on the table and the following conditions apply:
1. INSERT The entry to be inserted should be inserted before an already existing entry. (An INSERT statement that inserts behind the last record largely corresponds to an APPEND statement.) 2. DELETE The entry to be deleted is not the last entry of the table. 3. SORT The table has a certain size and is sorted.
An index is used for the efficient index access in the “logical sort sequence” or the efficient finding of “valid rows” if the table pages have gaps due to deletions. By means of the index, the logical sequence of the table is mapped on the physical memory addresses of the entries.
An index is available in two types: 1. As a linear index 2. As a tree-like index
The index structure is always maintained without any gaps, whereas the table pages may have gaps due to the deletion of records. In comparison to the management of the index without gaps, the management of the table pages without gaps would be too time consuming for larger tables.
Due to the management of the index structure without gaps, the insertion and deletion of records incur movement costs because the existing entries must be moved. Strictly speaking, these costs are overheads for copying. For large indexes (as of about 5,000 entries), they get dominant; this is why a tree-like index is created for large tables.
In addition to the index, a free list exists that manages the addresses of the entries that were deleted using DELETE for reuse.
Figure 7.4 shows a schematic diagram of a linear index.
Figure 7.4 Schematic Diagram of a Linear Index
Whether a tree-like index is created depends on system-internal rules, for example, the number of entries (to be expected), and other factors. Figure 7.5 shows a schematic diagram of a tree-like index.
Figure 7.5 Schematic Diagram of a Tree-Like Index
For the tree-like index, the index entries are organized in leaves. The previously mentioned movement or copy costs only incur at the leaf level. The index doesn’t have to be allocated at once; you only require continuous memory at leaf level. In return, you must first navigate through the tree structure when you access the index to reach the respective index entry.
Apart from that, the tree-like indexes on index tables are comparable to the database indexes presented in Chapter 5. A tree-like index requires about 50% more space than a linear index.
If the logical sequence of entries corresponds to the physical sequence in the main memory when the index tables are processed, you don’t need to create an index. In this case, the insertion sequence corresponds to the physical sequence, and the table was filled with a sorting and not deleted or sorted. If no index is necessary, the internal table requires less memory.
The hash administration is based on the unique key of the table. The hash administration is created for hashed tables only. It is established using the unique key of the internal table. Index accesses (for example, second entry of the internal table) are not possible, hashed tables can only be accessed with the key.
For the hashed table, each key value is assigned to a unique number using a hash function. For this number, the memory address of the respective data record is stored in a corresponding hash array.
If a DELETE or SORT is executed on a hashed table, you must create a doublelinked list (previous and next pointer), so sequential accesses (LOOP) via the data are still possible according to the insertion sequence (or in a sort sequence generated using SORT). The double-linked list requires about 50% more space for the hash administration.
Figure 7.6 shows a schematic diagram of a hash administration.
Figure 7.6 Schematic Diagram of a Hash administration
Besides the memory that is available to the user, there are further limitations for internal tables:
A limit for the number of rows in internal tables results because they are addressed internally and in ABAP statements via 4 byte integers, which limits them to 2,147,483,647 entries.
The size of hashed tables is further limited by the biggest memory block available at once. The maximum size is 2 GB, but it is usually further limited by the ztta/max_memreq_MB profile parameter. The maximum number of rows of hashed tables depends on the required size of the hash administration that must be stored there.
The actual maximum size of internal tables is usually smaller than specified by the previous limits because the overall available memory is usually not only used by a string or an internal table (see ABAP documentation: Maximum size of dynamic data objects).
Summary of the Table Types
Table 7.1 lists the most important characteristics of the table types. This is followed by a recommendation for when you should use which table type.
|Standard Table||Sorted Table||Hashed Table|
|Possible Accesses||Index access or key access||Index access or key access||Key access|
|Uniqueness||Non-unique||Non-unique or unique||Unique|
|Optimal Access||Index or binary search (if the table is sorted by the search components)||Index or key||Key|
Table 7.1 Characteristics of Table Types
Standard tables should only be used if all entries should be processed sequentially after filling or if the internal tables should be accessed flexibly and efficiently using multiple different keys. For this purpose, the table must be sorted by the search field and scanned using the binary search. The resorting is carried out only as often as necessary. If a resorting is only required for one or a few read accesses, the sort times far outweigh the time savings for reading. Use key accesses without binary search only for small tables or better avoid them completely. If you search only via a specific field, use a sorted or hashed table.
Sorted tables are particularly suited for partially sequential processing, for example, when a small part of a table should be processed via key accesses for which only the initial part of the key is given. Key accesses to the table key can also be carried out efficiently by the sorted tables.
Hash tables are optimal if you access only via the table key. If the key has a highleft significance, you can also use a unique sorted table because in this case performance benefits arise for the binary search when you access individual rows. In this context, left significance means that the selective part of a key should be positioned at the beginning of the key (as far to the left as possible).