Oracle® Warehouse Builder Data Modeling, ETL, and Data Quality Guide 11g Release 2 (11.2) E10935-05 |
|
|
PDF · Mobi · ePub |
Oracle Warehouse Builder enables you to design mappings that perform ETL using dimensional objects. This chapter describes extracting data from, removing data from, and loading data into dimensional objects.
This chapter contains the following topics:
The Dimension operator enables you to perform ETL on dimensions, slowly changing dimensions (SCDs) and time dimensions. You can extract data from, load data into, or remove data from dimensions using the Dimension operator. The dimensions may be deployed in relational form to an Oracle Database or to an analytic workspace.
Use a Dimension operator as a target in a mapping to load data into dimensions and SCDs. Define a data flow from the operators that represent the source objects to the dimension or SCD.
Oracle Warehouse Builder loads data into the dimension starting from the highest level.
Note:
You cannot map a data flow to the surrogate identifier or the parent surrogate identifier reference of a level.While loading data into a dimension, Oracle Warehouse Builder checks if a similar record exists in the dimension by comparing the business identifier of the source records with the business identifier of the existing dimension records.
To load data into a dimension:
Define a mapping as described in "Defining Mappings".
Add a Dimension operator to the mapping. Ensure that this operator is bound to the dimension into which you want to load data.
For information about adding operators to mappings, see "Adding Operators to Mappings".
Add operators corresponding to the source objects and to any transformations that must be performed before the source data is loaded into the dimension.
Map the attributes from the source operators to intermediate transformation operators (if the source data is to be transformed before loading it into the dimension) and then to the target dimension. Complete mapping all attributes according to your requirements.
If a record with the same business identifier as the one being loaded exists in the dimension, the record is updated with the attribute values from the source; otherwise the source record is loaded into the dimension.
Set the "Loading Type" property of the Dimension operator to Load.
Validate the mapping by selecting the mapping in the Projects Navigator and clicking Validate on the toolbar.
Resolve errors, if any, that result from the validation process.
Generate the mapping by selecting the mapping in the Projects Navigator and clicking Generate on the toolbar.
The generation results are displayed in a new Results tab in the Log window. Resolve generation errors, if any.
Figure 6-1 displays a mapping that loads data into the PRODUCTS
dimension, represented by the operator PRODUCTS_OUT
. The source data is stored in two tables, CATEGORIES
and PRODUCTS
. The CATEGORIES
table stores both the category and subcategory information. So the data from this table is filtered using two Filter operators CATS
and SUBCATS
. The filtered data is then mapped to the CATEGORIES
level and the SUBCATEGORIES
dimension levels. The TOTAL
level of the dimension is loaded using a Constant operator. The data from the PRODUCTS
table is mapped directly to the PRODUCTS
level in the dimension.
Figure 6-1 Loading the Products Dimension
When you define a data flow to load a dimension, an inline pluggable mapping that loads data into the dimension is created. To view this pluggable mapping, select the Dimension operator on the Mapping Editor canvas and click Visit Child Graph on the graphical toolbar.
A Type 2 SCD stores both historic and current records. When the value of any "Triggering Attribute" in the Type 2 SCD is modified, the current record is marked as closed and a new record containing the changed values is created. A record is marked as closed by setting the value specified by the "Default Expiration Time of Open Record" property to the expiration date attribute. Regardless of the input connection defined in a mapping, the expiration date of a historic record is set using the Default Expiration Time of Open Record property.
Note:
When you load some Type 2 SCDs, if the target is an Oracle9i database, only row-based mode is supported.A workaround is to switch on hierarchy versioning, by setting the parent surrogate identifier reference attribute as a trigger for all levels.
Before loading records into the Type 2 SCD, Oracle Warehouse Builder checks if a record with the same business identifier exists in the Type 2 SCD. If the record does not exist, then Oracle Warehouse Builder adds the record to the Type 2 SCD. If the record exists, then Oracle Warehouse Builder performs the following steps.
Marks the existing record as closed by setting the value specified in the property "Default Expiration Time of Open Record".
Creates a new record using the changed attribute values.
If the effective date input for the level is not mapped, the effective time and expiration time are set using the "Default Effective Time of Open Record" and the "Default Expiration Time of Open Record" properties of the operator.
If the effective date input for the level is mapped, then the effective time of the new record is set to the value that is obtained from the effective date input data flow. The effective date input, if connected, represents the actual effective date of each individual new record.
Note:
To load multiple records for a particular business identifier during a single load, set the "Support Multiple History Loading" property for the Dimension operator that is bound to the Type 2 SCD.Steps to Load Data into Type 2 SCDs
Define a mapping as described in "Defining Mappings".
Add a Dimension operator to the mapping. Ensure that this operator is bound to the Type 2 SCD into which you want to load data.
For information about adding operators to mappings, see "Adding Operators to Mappings".
(Optional) Select the Dimension operator on the canvas by clicking the operator name, and use the Property Inspector to set the following properties:
If you do not explicitly set values for these properties, the default values are used.
Set the "Loading Type" property of the Dimension operator to Load.
Add operators corresponding to the source objects and to any transformations that must be performed before the source data is loaded into the dimension.
Map the attributes from the source operators through intermediate transformation operators (if the source data is to be transformed before loading it into the dimension) and then to the target dimension operator. Complete mapping all attributes according to your requirements.
Note:
You cannot map attributes to the expiration date attribute of the Type 2 SCD.Validate the mapping by selecting the mapping in the Projects Navigator and clicking Validate on the toolbar.
Resolve errors, if any, that result from the validation process.
Generate the mapping by selecting the mapping in the Projects Navigator and clicking Generate on the toolbar.
The generation results are displayed in a new Results tab in the Log window. Resolve generation errors, if any.
Mapping Source Attributes to the Effective Data Attribute
In a mapping that loads a Type 2 SCD, if you map attributes from the source operator to the effective date attribute of a level, Oracle Warehouse Builder does the following:
While loading the initial record, if the value of the source attribute is earlier than the value specified by the "Default Effective Time of Initial Record" property of the dimension operator (bound to the Type 2 SCD), Oracle Warehouse Builder uses the value from the source as the effective date of the record; otherwise Oracle Warehouse Builder takes the value specified in the Default Effective Time of Initial Record property as the effective date of the record.
During subsequent loads for the record, if the record is being versioned, Oracle Warehouse Builder takes the effective time of the new record from the source. If no value is given for the effective time, then SYSDATE
is used. Oracle Warehouse Builder sets the expiration time of the closed record to the effective time of the new record minus the gap.
If you do not map attributes from the source to the effective date attribute of a level, then Oracle Warehouse Builder does the following:
While loading the initial record, Oracle Warehouse Builder uses the value specified in the "Default Effective Time of Initial Record" property as the effective date of the record.
During subsequent loads for the record, if a new version is being created, Oracle Warehouse Builder takes the effective time of the new record from the source. If no value is given for the effective time, then SYSDATE
is used. Oracle Warehouse Builder takes the expiration time of the previous version as the effective time of the new version minus the gap.
For more information about the gap, see "Type 2 Gap" and "Type 2 Gap Units".
Note:
Mapping to the Expiration Date attribute of a level is not enabled. While loading a record, the Default Expiration Time of Open Record property is used as the expiration date. The default value of this property isNULL
.Example: Values Assigned to Type 2 SCD Versioned Records
You create a mapping that loads the Products
Type 2 SCD. The leaf level of this Type 2 SCD, Product
, is loaded from a source table.The effective date attribute of the Product
level is mapped from the source attribute EFF_DATE
.
The Dimension operator has the following properties:
Default Effective Time of Initial Record: 01-jan-2000
Default Effective Time of Open Record: SYSDATE
Default Expiration Time of Open Record: 01-jan-2099
Type 2 Gap: 1
Consider a source Product
level record with the value of EFF_DATE
as 21-mar-2007 10.25.05.000000 PM.When the initial Product
level record is loaded, the values assigned to the record are:
Effective date: 01-jan-2000
Expiration date: 01-jan-2099
When the Product
level record is versioned during a subsequent load on 21-mar-2007, the following occurs:
The value of the source attribute overrides the Default Effective Time of Open Record property. Thus, the effective date stored in the new Product
level record is 21-mar-2007 and the expiration date is set to 01-jan-2099.
The initial Product
level record is closed with the value of the expiration date set to 21-mar-200710.25.04.000000 PM.
Use the following steps to load data into Type 3 SCDs.
Define a mapping as described in "Defining Mappings".
Add a Dimension operator to the mapping. Ensure that this operator is bound to the Type 3 SCD into which you want to load data.
For information about adding operators to mappings, see "Adding Operators to Mappings".
Set the "Loading Type" property of the Dimension operator to Load.
Add operators corresponding to the source objects and to any transformations that must be performed before the source data is loaded into the Type 3 SCD.
Map the attributes from the source operators to intermediate transformation operators (if the source data is to be transformed before loading it into the dimension) and then to the target dimension operator. Complete mapping all attributes according to your requirements.
While loading data, Oracle Warehouse Builder checks if a record with the same business identifier exists in the Type 3 SCD. If the record does not exist, then it is added. If the record exists, then the following steps are performed:
The values of the versioned attributes are moved to the attributes that store the previous values of versioned attributes.
The record with the values from the source record is updated.
Set the Loading Type property of the Dimension operator to Load.
Validate the mapping by selecting the mapping in the Projects Navigator and clicking Validate on the toolbar.
Resolve errors, if any, that result from the validation process.
Generate the mapping by selecting the mapping in the Projects Navigator and clicking Generate on the toolbar.
The generation results are displayed in a new Results tab in the Log window. Resolve generation errors, if any.
Example: Loading Data into Type 3 SCDs
Figure 6-2 displays a mapping that loads data into the PRODUCTS level of the Type 3 SCD. In this mapping, the effective time of the current record is loaded from the source. You can also use the Default Effective Time of Current Record property to set a default value for the effective time of a level record.
You cannot map a data flow to the attributes that represent the previous values of versioned attributes.
For example, in the mapping shown in Figure 6-2, you cannot map an attribute to the PREV_PACK_SIZE
and PREV_DESCRIPTION
attributes of the PRODUCTS
level.
The Type 2 SCD PRODUCTS_TYPE2 contains the levels Total, Categories, and Product. Product is the leaf level and its attribute Pack_size is the versioned attribute. The Effective_date and expiration_date attributes store the effective date and expiration date, respectively, for the product records.
The source data that is to be loaded into this dimension is stored in two tables: Categories_tab
and Product_information
. The name and the description of the highest level in the Type 2 SCD are loaded using the Total_desc
attribute of a Constant operator.
Use the following steps to load the PRODUCTS_TYPE2 Type 2 SCD.
Define a mapping as described in "Defining Mappings".
From the Projects Navigator, drag and drop the PRODUCTS_TYPE2 Type 2 SCD, the Categories_tab table, and the Product_information table onto the mapping canvas.
Set the "Loading Type" property of the Dimension operator to Load.
Select the Dimension operator on the canvas. The Property Inspector displays the dimension properties. Loading Type is listed under the Dimension Properties node.
Drag and drop a Constant operator used to load the Total level onto the mapping canvas.
Also, add an output attribute called Total_desc to this operator. Set the Expression property of the Total_desc operator to the value assigned to the total_name and total_desc attributes.
Map the attributes from the source operators to the Type 2 SCD.
Figure 6-3 displays the mapping with the source operator mapped to the target. Type 2 SCD.
Figure 6-3 Loading Data Into a Type 2 SCD
In this example, the effective time of a record is loaded from the source. You can also choose to set this to a default value, such as SYSDATE
, using the Default Effective Time of Current Record property.
Because no values are explicitly assigned to the history logging properties, the default values are used.
Validate the mapping by selecting the mapping in the Projects Navigator and clicking Validate on the toolbar.
Resolve errors, if any, that result from the validation process.
Generate the mapping by selecting the mapping in the Projects Navigator and clicking Generate on the toolbar.
The generation results are displayed in a new Results tab in the Log window. Resolve generation errors, if any.
You have now designed a mapping that loads data into a Type 2 SCD. To actually move the data from the source tables into the Type 2 SCD, you must deploy and run this mapping. For more details about deploying and executing mappings, see "Starting ETL Jobs".
You can extract data stored in a workspace dimension, slowly changing dimension (SCD), or time dimension by using a Dimension operator as a source in a mapping.
Note:
You cannot extract data from dimensions that use a MOLAP implementation.Use the following steps to define a mapping that extracts data stored in dimensions.
Define a mapping as described in "Defining Mappings".
Add a Dimension operator to the mapping. Ensure that this operator is bound to the dimension from which you want to extract data.
For information about adding operators to mappings, see "Adding Operators to Mappings".
Add operators corresponding to the target object and to any transformations that must be performed before the dimension data is loaded into the target.
Map the attributes from the dimension levels to the target operator or to intermediate operators that transform the source data. Complete mapping all attributes according to your requirements.
Validate the mapping by selecting the mapping in the Projects Navigator and clicking Validate on the toolbar.
Resolve errors, if any, that result from the validation process.
Generate the mapping by selecting the mapping in the Projects Navigator and clicking Generate on the toolbar.
The generation results are displayed in a new Results tab in the Log window. Resolve generation errors, if any.
Use a mapping containing a Dimension operator to extract data stored in a Type 2 SCD. The Dimension operator should be bound to the Type 2 SCD that contains the source data.
Because a Type 2 SCD stores multiple versions of a single record, you must specify the version of the record that should be extracted. To extract the current version of a record, set the "Type 2 Extract/Remove Current Only" property of the Dimension operator to Yes. To extract all records, including historic ones, set the "Type 2 Extract/Remove Current Only" property to No.
Additionally, you can set the following properties for the Dimension operator: "Default Effective Time of Initial Record", "Default Effective Time of Open Record", "Default Expiration Time of Open Record", "Type 2 Gap", and "Type 2 Gap Units". These properties enable you to assign values to the versioned attributes in the Type 2 SCD. All these properties have default values as displayed in the Property Inspector for the Dimension operator. If you do not explicitly assign values to these properties, then Oracle Warehouse Builder uses the default values.
To define a mapping that extracts data from a Type 2 SCD:
Define a mapping as described in "Defining Mappings".
Add a Dimension operator to the mapping. Ensure that this operator is bound to the Type 2 SCD from which you want to extract data.
For information about adding operators to mappings, see "Adding Operators to Mappings".
Add operators corresponding to the target object and to any transformations that must be performed before the Type 2 SCD data is loaded into the target.
Map the attributes from the source Type 2 SCD to the target or to intermediate operators that transform the source data. Complete mapping all attributes according to your requirements.
To specify the version of source records from the Type 2 SCD that should be extracted:
Set the "Type 2 Extract/Remove Current Only" property to Yes to extract the current record.
Set the "Type 2 Extract/Remove Current Only" property to No to extract historic records.
Validate the mapping by selecting the mapping in the Projects Navigator and clicking Validate on the toolbar.
Resolve errors, if any, that result from the validation process.
Generate the mapping by selecting the mapping in the Projects Navigator and clicking Generate on the toolbar.
The generation results are displayed in a new Results tab in the Log window. Resolve generation errors, if any.
Use a mapping containing a Dimension operator to extract data from a Type 3 SCD. The operator must be bound to the Type 3 SCD that contains the source data.
A Type 3 SCD uses separate attributes to store historic values of versioned attributes. Depending on whether you want to extract the current record or historic values, you map the attributes in the Type 3 SCD to the target operators.
To define a mapping that extracts data from a Type 3 SCD:
Define a mapping as described in "Defining Mappings".
Add a Dimension operator to the mapping. Ensure that this operator is bound to the Type 3 SCD from which you want to extract data.
For information about adding operators to mappings, see "Adding Operators to Mappings".
Add operators for the target object and for any transformations that are needed before the dimension data is loaded into the target.
Map the attributes from the source Type 3 SCD either to the target operator or to operators that transform the source data.
To extract historic data, map the attributes that represent the previous values of versioned attributes to the target or intermediate operator.
To extract the current record, map the attributes that store the level attributes to the target or intermediate operator.
Validate the mapping by selecting the mapping in the Projects Navigator and clicking Validate on the toolbar.
Resolve errors, if any, that result from the validation process.
Generate the mapping by selecting the mapping in the Projects Navigator and clicking Generate on the toolbar.
The generation results are displayed in a new Results tab in the Log window. Resolve generation errors, if any.
Figure 6-4 displays a mapping that sources the historic data records from the PRODUCTS
Type 3 dimension. In this example, to source historic data, use the PREV_DESCRIPTION
, PREV_PACKAGE_TYPE
, or PREV_PACKAGE_SIZE
attributes. To source current data, use DESCRIPTION
, PACKAGE_TYPE
, or PACKAGE_SIZE
.
Figure 6-4 Mapping That Sources Data from a Type 3 SCD
Use the Dimension operator to remove data from dimensions and SCDs. You create a mapping with the Dimension operator, the source objects containing the data that must be removed from the dimension, and any required transformation operators. Map attributes from the source or transformation operators to the Dimension operator. When the map is run, the business identifier of the source record is compared to the business identifiers in the dimension. If the business identifiers match, then the corresponding record in the dimension is removed.
To remove data from dimensions or SCDs, set the "Loading Type" property of the Dimension operator to Remove.
Effect of Surrogate Keys on Dimension Data Removal
When you remove data from a dimension that was created with surrogate keys, parent records of existing children are removed, but child records are left referencing nonexistent parents.
When you remove data from a dimension that was created with no surrogate keys, parent records of existing child records and the child records are removed. This is in effect a cascade operation.
The DF1_SINGLEH1_SCD1 is a dimension containing the levels Total, Region, Territory, and Salesrep. This dimension contains existing data that was loaded earlier using another mapping. The tables WBSALESREP_ALL, WB_REGIONS, WB_TERRITORIES, and WBSALESREPTERRITORIES contain the data that must be removed from the various levels in the dimension.
Use the following steps to remove data from the DF1_SINGLEH1_SCD1 dimension.
Define a mapping as described in "Defining Mappings".
From the Projects Navigator, drag and drop the DF1_SINGLEH1_SCD1 dimension onto the mapping canvas.
Set the "Loading Type" property of the Dimension operator to Remove.
From the Projects Navigator, drag and drop the following tables onto the mapping canvas: WBSALESREP_ALL, WB_REGIONS, WB_TERRITORIES, and WBSALESREPTERRITORIES.
Add a Constant operator to the mapping. Create two output attributes ID and Name. To the Expression property of both attributes, assign the values to remove from the Total level.
For example, if you set the Expression attribute of ID to 100 and that of Name to Asia, then all level records, in the Total level, whose ID and Name match these values are deleted. Also, because the ID attribute of the Constant operator is mapped to the Region attribute, all child records of the Total level records that are removed are also removed.
Map the attributes from the source operators to the Dimension operator.
Figure 6-5 displays the mapping with the attributes connected to the Dimension operator.
Figure 6-5 Mapping that Removes Data From a Dimension
The business identifier of the source records is compared to the business identifier of the dimension level record to which it is mapped. If the business identifier matches, then the corresponding record is removed from the dimension.
Validate the mapping by selecting the mapping in the Projects Navigator and clicking Validate on the toolbar.
Resolve errors, if any, that result from the validation process.
Generate the mapping by selecting the mapping in the Projects Navigator and clicking Generate on the toolbar.
The generation results are displayed in a new Results tab in the Log window. Resolve generation errors, if any.
You have now designed a mapping that removes data from a dimension. To actually remove the specified data from the dimension, you must deploy and run this mapping. For more details about deploying and executing mappings, see "Starting ETL Jobs".
The Cube operator enables you to extract data from, load data into, and remove data from cubes. To extract data from cubes, use the Cube operator as a source in a mapping.
Note:
You cannot extract data from cubes that use a MOLAP implementation.Use the "Loading Type" property of the Cube operator to indicate if data is being loaded into the cube or removed from the cube. For cubes, the Loading Type property can have the following three values: LOAD, INSERT_LOAD, and REMOVE.
See Also:
"Loading Type" for more information about the Loading Type property of cubesACTIVE_DATE Attribute in Cubes
Cube operators contain an attribute called ACTIVE_DATE. This attribute represents the point in time that is used to determine which record in a Type 2 SCD is the active record. This property is applicable only when the cube that you are loading has one or more Type 2 SCDs.
If you do not map an attribute from the source to ACTIVE_DATE, SYSDATE
is used as the default.
If you map a source attribute to ACTIVE_DATE, the value of the source attribute is used to determine which version of the Type 2 SCD record is referenced by the cube record.For any cube that references a dimension in which the level is of a Type 2 SCD, the WHERE
clause generated to determine the dimension member is as follows:
... WHERE (... (<dim_name>.DIMKEY = <lookup_for_dimension_dimkey> AND (<level>_EFFECTIVE_DATE <= ACTIVE_DATE AND <level>_EXPIRATION_DATE >= ACTIVE_DATE) OR (<level>_EFFECTIVE_DATE <= ACTIVE_DATE AND <level>_EXPIRATION_DATE IS NULL)) ...)
Loading Data from Type 2 SCDs into Cubes
If a mapping that loads a cube references at least one Type 2 SCD that has the Default Expiration Time of Open Record set to a non-NULL value, then the ACTIVE_DATE attribute of the Cube operator must be mapped from the source that contains the date value that defines the range for the dimension record.
If the ACTIVE_DATE attribute is not mapped from the source, then the SYSDATE
value defines the date range for the dimension record.
When the ACTIVE_DATE attribute is mapped from the source, the source attribute value is used to perform a range comparison to determine which dimension record should be loaded.
The WHERE
clause listed previously describes the logic used to perform the lookup for the dimension member.
When you load a cube, you map the data flow from the source to the attribute that represents the business identifier of the referencing level. Oracle Warehouse Builder performs a lookup on the dimensions and then stores the corresponding surrogate identifier in the cube table. For example, when you map the attributes from the dimension operator to the cube operator, a Lookup operator is created in cases where it is needed to lookup the surrogate identifier of the dimension.
If there is a possibility of the lookup condition returning multiple rows, you must ensure that only one row is selected out of the returned rows. You can do this by using the Deduplicator operator or Filter operator.
Use the following steps to load data into a cube.
Define a mapping as described in "Defining Mappings".
Add a Cube operator to the mapping. Ensure that this operator is bound to the cube into which you want to load data.
For information about adding operators to mappings, see "Adding Operators to Mappings".
Add operators corresponding to the source objects and to any transformations that must be performed before the source data is loaded into the dimension. Ensure that all source data objects are bound to the repository objects.
Map the attributes from the source operators to intermediate transformation operators (if the source data is to be transformed before loading it into the dimension) and then to the target cube. Complete mapping all attributes according to your requirements.
Set the "Loading Type" property of the Cube operator to Load.
Validate the mapping by selecting the mapping in the Projects Navigator and clicking Validate on the toolbar.
Resolve errors, if any, that result from the validation process.
Generate the mapping by selecting the mapping in the Projects Navigator and clicking Generate on the toolbar.
The generation results are displayed in a new Results tab in the Log window. Resolve generation errors, if any.
Note:
If the source data for your cube contains invalid or null values for the dimension references, it is recommended that you use Orphan Management or DML Error Logging to avoid possible problems during subsequent cube loads.Figure 6-6 displays a mapping that uses the Cube operator as a target. Data from three source tables is joined using a Joiner operator. An Aggregator operator is used to aggregate the joined data with the data from another source table. The output of the Aggregator operator is mapped to the Cube operator.