12 Globalization Support

Globalization Support is a technology enabling Oracle applications to interact with users in their native language, using their conventions for displaying data.

This chapter includes the following sections:

Overview
Obsolete NLS Parameters
DB2 Character Sets Handled Automatically
Oracle Database 10g for z/OS
Oracle Database Server and Client Configuration
Message Availability

12.1 Overview

The Oracle Globalization Support architecture is data-driven, enabling you to add support for specific languages and character encoding schemes without requiring any changes in source code.

A number of different settings in the gateway, DB2, Oracle database server, and client affect Globalization Support processing. Character settings of these components must be compatible so that translations occur correctly. Each character in one encoding scheme must have a matching character in the other encoding schemes.

12.2 Obsolete NLS Parameters

These NLS parameters were used in earlier releases of the DB2 Gateway (before 10.2) but are obsolete now for the DB2 Gateway only. Other gateways may still continue to use these parameters:

NLS_LANG
NLS_NCHAR
NLS_DATE_FORMAT
HS_LANGUAGE
HS_NLS_NCHAR
HS_NLS_DATE_FORMAT

Do not specify any value for these parameters. This will cause failures in the DB2 Gateway.

If you are converting from a earlier release of the DB2 Gateway, do not use the parameters from that release. Use the parameters that are generated by the installation process as there are new required parameters that will cause the gateway to fail if they are missing, and obsolete parameters that must no longer be specified.

12.3 DB2 Character Sets Handled Automatically

Internally, the DB2 Gateway handles all DB2 Coded Character Set IDs (CCSIDs) automatically. The gateway detects the CCSIDs of DB2 columns and stored procedure parameters and handles all conversions with no parameter specification required by the user.

First, a little background on character set handling in the DB2 Gateway.

Unicode is the name for a family of industry-standard character sets. The Unicode character sets are designed to be all encompassing, that is, they can represent every code point in every character set. This makes Unicode an ideal lingua-franca (common language (or common character set in this case)) for exchanging electronic data.

The interesting Unicode character sets (there are many) are normally referred to as UTF8 (AL32UTF8 is Oracle's Globalization Support character set for Unicode Version 4 UTF8) which is a variable width character set (each character set is represented by 1 to 3 bytes), and UCS2 (AL16UTF16 is Oracle's Globalization Support character set for Unicode Version 4 UCS2) which is a fixed-width 16-bit (double-byte) character set.

With DB2 7.1 and later, now supporting EBCDIC, ASCII, and UNICODE tables, plus single-byte, mixed and double-byte CCSIDs for EBCDIC, ASCII, and UNICODE (9 CCSIDs for each DB2 subsystem), a different approach had to be taken to fully support all character sets.

The way the DB2 CCSIDs are handled is different depending on what is being done.

All SQL statements going into the gateway are converted to Unicode (AL32UTF8) before being sent to DB2. This includes passthrough SQL statements. If the Oracle server is already running AL32UTF8, no conversion is needed. So, it is highly recommended to run the Oracle server in AL32UTF8.

All input bind variables for SQL (for SELECT, INSERT, UPDATE and DELETE) are converted to Unicode (AL32UTF8). All bind variables for passthrough SQL are also converted to Unicode (AL32UTF8).

This approach of converting all SQL statements and bind variables to Unicode (AL32UTF8) allows all possible code points to be sent to DB2. No matter what character sets DB2 is running in, all code points for DB2's character sets can appear in constants in SQL statements and bind variables for SQL.

SELECT results are handled differently. The results of SELECT or passthrough SELECT statements are fetched from DB2 in the character set of the column with no character set translation. The data is then translated directly to the character set of the Oracle server. This minimizes the number of translations, and in the case of selecting DB2 Unicode data with an Oracle server running in Unicode, no character set translations take place. The same would be true for selecting DB2 ASCII data with an Oracle server running in the same ASCII character set.

There is one exception to how SELECT results are handled, and that is for the DB2 GRAPHIC data types (GRAPHIC, VARGRAPHIC and DBCLOB). These data types are always fetched in UCS2 (AL16UTF16) for easier handling internally by the gateway. The GRAPHIC column data is then translated to the character set of the server.

Stored procedures and user-defined functions (UDFs), because they require parameters of a specific length for both input and output, require a different approach.

Input, output and in/out parameters to stored procedures and UDFs are translated from the character set of the Oracle server to the CCSID of that DB2 parameter on input, and from the CCSID of the parameter to the character set of the Oracle server on output. If a stored procedure is defined to use Unicode parameters and the Oracle server is running in Unicode, then no translation takes place. The same would be true for ASCII parameters in the same character set as the Oracle server.

Because the gateway can now handle DB2 data in any character set, the character set of the Oracle server now becomes the limiting factor for character set support. If your Oracle server was installed with the default database character set of US7ASCII, then only the 26 Latin alphabetic characters are supported. If you use a regional character set (for example- Western Europe – WE8ISO8859P1), then you will get support for all the characters in the 10 Western European languages that this character set supports. Oracle recommends the use of an Oracle server with a database character set of AL32UTF8, because this character set supports every other character set in existence. If your Oracle server is using UTF8, then it should be change to use AL32UTF8; otherwise character set translations will occur between the Oracle server and the DB2 Gateway.

If you are using Oracle Database 10g for z/OS as the integrating server, then refer to the next section, "Oracle Database 10g for z/OS," for further details, prior to installing the gateway and Oracle Database 10g for z/OS.

12.4 Oracle Database 10g for z/OS

Beginning with Oracle Enterprise Edition for OS/390 release 7.3.2, the Oracle database server is created with the default character set WE8EBCDIC1047 to facilitate compatibility with the IBM z/OS compiler character set.

When Oracle Enterprise Edition for OS/390 release 7.3.2 or later is used as the integrating server in your Oracle Transparent Gateway for DB2 configuration, it is important (when creating the Oracle Database 10g for z/OS database) to consider using the same EBCDIC character set for Oracle Database 10g on z/OS as the DB2 EBCDIC installation CCSID. In most cases, the DB2 CCSID is 37 (WE8EBCDIC37 in Oracle Globalization Support character sets). It is important to confirm this setting because the default setting for the Oracle database character set on z/OS is WE8EBCDIC1047, which is IBM's EBCDIC 1047 that is not a supported CCSID for DB2.

If the character sets vary between the Oracle Database 10g for z/OS and the DB2 EBCDIC installation CCSID, then performance can be impacted because all data requires translation. The Oracle Database 10g for z/OS must run in some EBCDIC database character set, Unicode (AL32UTF8) cannot be used for the database character set on z/OS. In addition, the NLS_LANG specification of a z/OS client process (such as SQL*Plus or client application programs) should be the same for the Oracle Database 10g for z/OS database to avoid translation between the client process and Oracle Database 10g for z/OS.

12.4.1 Double-Byte Character Support

All DB2 GRAPHIC (double-byte) data types (GRAPHIC, VARGRAPHIC, and DBCLOB) are automatically mapped to their Oracle CHAR equivalents. Mapping of DB2 GRAPHIC data types to Oracle NCHAR data types is no longer supported. There is an environment variable that controls certain aspects of the double-byte character set support, specified in the ORA$ENV DD.

The following describes the parameter 'ORADBMBOPT'.

This version of the Oracle Transparent Gateway for DB2 allows you to access and change columns in DB2 that are designated as VARGRAPHIC or GRAPHIC. Because of the IBM definition of a (VAR)GRAPHIC column, you can insert only double-byte characters into (VAR)GRAPHIC columns.

Some of IBM's double-byte character sets (the EBCDIC ones) do not have a double-byte character that represents the regular letters A-Z. They only have Òdouble-wideÓ versions of A-Z which are a set of different characters.

All of IBM's ASCII double-byte character sets and IBM's Unicode double-byte character set do not have this problem. They have a representation for the regular letters A-Z and for a double-wide A-Z.

Because the DB2 Gateway provides all SQL statements and SQL bind variables to DB2 in Unicode (AL32UTF8), any single or double-byte character can be sent to DB2 in literals or bind variables, and DB2 automatically converts single-byte characters to their double-byte equivalent in those character sets that do not have a representation for single-byte characters.

Inserting a value represented by a mixed-byte string constant into a (VAR)GRAPHIC column defined to use an EBCDIC character set will result in automatic conversion by DB2 of single-byte only characters to their double-byte equivalents.

For example, an ASCII client issuing the following INSERT statement:

INSERT INTO mytable.graphcol@tg4db2 values('AxxB')

Where A and B are the normal ASCII single-width characters and xx designates a double-width character in one of the Asian ASCII-based character sets such as JA16SJIS, ZHS16CGB213280, ZHT16BIG5, or KO16KSC5601. Where no double byte character corresponds to a single-byte character, it is converted by DB2 to its double-byte equivalent. In the above example, A and B would be stored as double-wide A and B.

When selecting back the above values, it is unclear when we find a double-wide letter whether it should be converted back to a regular letter, or whether it should be returned as a double-wide letter (for example, Unicode AL32UTF8 has encodings for both a regular A and a double-wide A).

This version of the Oracle Transparent Gateway for DB2 provides a workaround, but the workaround carries a risk for data integrity. If you desire to choose this path, then you must realize that if you attempt to INSERT a string with a single-byte character, then that string may be returned to you entirely as a DBCS string in a subsequent SELECT operation as a double-wide character. The rules are as follows:

On input, regardless of ORADBMBOPT option, the single-byte characters would still be converted to their double-byte equivalents before being inserted to DB2 graphic datatypes. This is different from 10gR1 or prior gateway in which the single-byte characters would be inserted to DB2 graphic datatypes as is, if ORADMBOPT is not set to FORCE_SB. The same is true for values represented by constants or for values represented by bind variables.
On output, ORADBMBOPT option would make a difference. If ORADBMBOPT=FORCE_SB is specified, then on a SELECT, each (VAR)GRAPHIC column will be examined for double-byte characters that correspond to single-byte characters. Each such character is replaced by its corresponding single-byte character. This is where the data integrity problem arises. Because single-byte characters have been forced to their corresponding double-byte characters on input, there is no way to know if a double-byte character actually came from the translation of an ASCII wide character or from the process of forcing single-byte characters to the corresponding double-byte characters. If ORADBMBOPT is not set to FORCE_SB, then the double-byte characters would not be replaced by its corresponding single-byte characters.

As an example, use the previous INSERT statement, which is:

INSERT INTO mytable.graphcol@tg4db24 values('AxxB')

The resultant DBCS value in the DB2 column may look like the following:

wAyywB

where wA is the DBCS correspondence (usually 0x42C1) for the double wide A, wB is the DBCS correspondence (usually 0x42C2) for the double wide B, and yy is the DBCS character corresponding to the ASCII-based xx. When SELECTing from this column, on the client you would get exactly what you INSERTed, that is, AxxB—which is good. But what if the DBCS character that is represented in the DB2 column by wa had actually been INSERTed into the column through a valid ASCII representation for a wide A? With the FORCE_SB option, you get a single-byte A on the client. This may not be exactly what you wanted.

In the end, it is you who must decide whether this option is valuable to you. You must decide if you can accept the possible problems that can arise.

Some character sets will get no double-wide characters returned (like US7ASCII). All double-wide characters will be returned as the substitution character for that character set (usually a?). Unicode (AL32UTF8) can represent both regular and double wide letters so if you run the Oracle server with that character set, then you may not want to use this option.

The ORADBMBOPT environment variable controls this feature. If there is no ORADBMBOPT environment variable, then no forcing of double-byte characters to the single-byte correspondences will take place when transferring data from DB2 columns. Placing the string FORCE_SBCS into ORADBMBOPT turns the feature ON.

One good workaround for this double-byte problem is to use Unicode for all your DB2 GRAPHIC columns. The double-byte Unicode includes code points for both regular and double-wide characters, so anything can be inserted into these columns and it will come back out the same.

Note that you should not use passthrough to execute any SQL commands that contain graphic constants, unless those constants conform fully to DB2 graphic constants. DB2 graphic constants start with G' (G apostrophe) or N' (N apostrophe) and end with an apostrophe (') character. These will be converted to Unicode (AL32UTF8), and the constants must follow the DB2 rules for forming Unicode literals.

12.5 Oracle Database Server and Client Configuration

There are a number of NLS parameters controlling Globalization Support processing between the Oracle database server and client. You can set language-dependent behavior defaults for the server and set language-dependent behavior for the client that overrides these defaults. For a complete description of NLS parameters, refer to the Globalization Support chapter in the Oracle Database SQL Reference. These parameters do not affect the gateway processing. However, you must ensure that the character set is compatible with the DB2 data's CCSIDs. In other words, each character in one encoding scheme must have a matching character in another encoding scheme.

When you create the database, the character set used to store data is specified by the CHARACTER SET parameter. Once the database is created, the database character set cannot be changed unless you re-create the database.

Normally, the default for CHARACTER SET on non-EBCDIC platforms is US7ASCII, which supports only the 26 Latin alphabetic characters. If you have specified 8-bit character sets on the gateway and DB2, then you must have a compatible 8-bit character set defined for your database. To check the character set of an existing database, enter the command:

SELECT USERENV('language') FROM DUAL;

12.6 Message Availability

The gateway itself never returns any message text, so the setting in NLS_LANG in the DB2 Gateway for language is not used. In fact, NLS_LANG should never be specified for the DB2 Gateway.

All message text is retrieved by the Oracle server, so set your language parameters on the server or the client side to see the message in a different language.

In some cases, DB2 error message text is embedded within an Oracle error message. DB2 supplies error message text only in English, so the DB2 error message text is never translated to other languages.