Market Basket Analysis Overview (2024)

This chapter describes Market Basket Analysis (MBA) and contains the following sections:

Market Basket Functional Overview

Market Basket Analysis is a data mining technique that outputs correlations between various items in a customer's basket.

Market Basket Analysis reports are used to understand what sells with what, and includes the probability and profitability of market baskets. Such a report can be used to plan promotions, optimize product placement, and support store planogram decisions. These reports help you understand the statistical relationship between sales for different merchandise.

See the Oracle Retail Analytics User Guide for the Market Basket Reporting Overview.

Market Basket Data Mart Backup

Market Basket Analysis maintains a history of data mining results for a defined number of weeks. The number of weeks can be specified in the data mining configuration table W_RTL_DMS_CONFIG_G. For configuration details, see the Oracle Retail Analytics Installation Guide.

The MBA PLP programs are used to control the number of weeks in MBA tables. Once the number of weeks in those tables reaches the number that is specified in the configuration table, MBA PLP programs drop the earliest partition to keep the number of weeks in the target table from exceeding the maximum value.

Since this partition dropping is done automatically, you should back up all MBA tables on a weekly basis. Tables with partitions that are automatically dropped include:

W_RTL_MB_SLS_CL_WK_A
W_RTL_MB_SLS_DP_WK_A
W_RTL_MB_SLS_SC_WK_A
W_RTL_MB_SLS_SC_WK_SUP_A
W_RTL_MB_SLSPR_CUST_SC_WK_A
W_RTL_MB_SLSPR_SC_WK_A
W_RTL_MB_SLSPR_SC_WK_SUP_A

Reclassification Impact

The data mining process is performed on a weekly basis. The sales data is used as source data for the mining process for one week. This can improve performance for the data mining process and data mining ETL loading. However, if any product or organization reclassification occurs during the week, then the weekly mining may not have an accurate result. When a reclassification happens, the data mining process must use the sales data from the specified number of weeks instead of the current one week to get consistent and accurate results.

The RA_PROD_WEEKLY_RECLASS_IND and RA_ORG_WEEKLY_RECLASS_IND parameters indicate if there is any reclassification in the current week. The values of these two parameters are updated by the item and location daily ETL programs and are stored in the RA_C_ODI_PARAM table. These two parameters should not be updated by the batch user manually. In the mining configuration table (W_RTL_DMS_CONFIG_G), the restart history indicator for different mining (such as TOP10_RESTART_HIST_IND) and the mining start date wid for different mining (such as TOP10_MIN_DY_WID) are updated based on these parameters by the mining refresh program.

Market Basket Operations

Market Basket Analysis (MBA) helps you to find the relationship between items and groups of items in the basket of a customer. You can also use it to calculate a promotion-based historical baseline in order to provide insight into retail sale patterns and to improve your understanding of promotional effectiveness.

Market Basket Analysis Architecture

This section outlines the Market Basket Analysis architecture and its components. Review of the architecture is important in understanding the data flow.

Figure 7-1 shows the technical architecture of Market Basket Analysis.

Figure 7-1 Market Basket Analysis Technical Architecture

Market Basket Analysis consists of the following four components, which are discussed in the subsequent sections.

Service Manager
ETL Service
ARM Services
Baseline Services

Extract, Transform, Load

The ETL Service transforms the tables into different organized tables of MBA for efficient data mining.

Process Flow

MBA resides in a separate set of database schemas and uses ETL scripts to move input data from Retail Analytics into the MBA tables.

Figure 7-2 shows the ETL process.

Figure 7-2 Market Basket Analysis ETL Process Flow

ETL programs are added to transform Retail Analytics tables into different organized tables for organization and merchandise hierarchy, customer segments, and calendar for efficient lookups or aggregations for data mining.

The ETL scripts may be run weekly or during a scheduled time frame to refresh the data in the MBA schema before running reports as part of the weekly batch process. The results of any reports are prepared for consumption in Retail Analytics by a separate outbound ETL process that publishes results to predetermined output tables and materialized views, and cleans up any intermediate data in the MBA tables.

ETL Objects List

Table 7-1 lists the ETL objects.

Table 7-1 ETL Objects List

Program Name	Description	Source Table or File	Target Table
RSE_CORE_ETL. load_merch	Loading of the Merchandising data	W_RTL_DMS_PRODUCT_G	RSE_PROD_TC
RSE_CORE_ETL. load_org	Loading of the Organization data	W_RTL_DMS_ORG_DH_G	RSE_INT_ORG_TC
RSE_CORE_ETL. load_cur_cust_seg	Loading of the Customer Segment data	W_RTL_CUSTSEG_D	RSE_CUR_CUSTSEG_D
RSE_CORE_ETL. load_cur_cust_seg_xref	Loading the current classification of customers to customer segments	W_RTL_CUST_CUSTSEG_D	RSE_CUR_CUST_CUSTSEG_D
RSE_CORE_ETL. load_cal	Loading of the calendar data from RA	W_MCAL_DAY_D W_MCAL_WEEK_D	RSE_MCAL_DAY_WK_XREF (Materialized View)

ARM Services

ARM Services implements the mining services for Top 10 Product Affinities, Anchor Subclass Top Affinities, Anchor Customer Segment Promotion Affinities, Anchor Subclass Top Affinities by Promotions and Top 10 Promoted Subclass Affinities in order to find the desired association rules.

There are five services associated with ARM, which are discussed in the following sections:

Top 10 Product Affinities (TOP10)
Anchor Subclass Top Affinities (ANC_SC)
Anchor Customer Segment Promotion Affinities (CUST_PROMO)
Anchor Subclass Top Affinities by Promotions (ANC_SC_PROMO)
Top 10 Promoted Subclass Affinities (TOP10_PROMO)

Note that since TOP10 and ANC_SC use Oracle Data Mining (ODM), you must obtain ODM licenses prior to executing them in order to comply with the terms of the Oracle Licensing Policy. The CUST_PROMO, ANC_SC_PROMO and TOP10_PROMO do not use ODM.

Process Flow

Figure 7-3 shows the process flow diagram for the ARM Services.

Figure 7-3 Market Basket Analysis ARM Services Process Flow

ARM Generic Configuration Parameters

Table 7-2 lists the generic parameters that are used for all of the ARM services. They are configured in the table W_RTL_DMS_CONFIG_G, which is populated by an ETL process. The ARM_BATCH_DOP, ARM_WEEKLY_DOP, and ARM_HIST_NUM_OF_WEEK configurations should be determined during initial setup and may be adjusted for performance reasons.

To gain maximum performance benefit, consider configuring the parameters associated with parallel processing.

Table 7-2 ARM Generic Configuration Parameters

Parameter	Type	Configurable	Sample Data	Description
ARM_BATCH_DOP	NUMBER	Y	1	This is a degree of ARM data mining batches that can be simultaneously processed within each week.
ARM_HIST_NUM_OF_WEEK	NUMBER	Y	16	This is the number of weeks that the mining result history are held for.
ARM_WEEKLY_DOP	NUMBER	Y	1	This is a degree of weeks that can be simultaneously processed for ARM data mining when there is more than one week to be processed. This configuration should only be used if the database server is large enough to accommodate it.
ARM_WEEK_WID	NUMBER	Y	124020110038	This is the current business week number that the mining is executed for.
ARM_MIN_SUPPORT_TXN_CNT	NUMBER	Y	1000	This optional configuration provides a lower boundary for the minimum supporting transaction count for any of the affinity processes. This configuration prevents situations where the percentage based value in the other minimum confidence configurations results is too low of a value, if the transaction count fluctuates a lot.

Top 10 Product Affinities (TOP10)

The mining service identified as TOP10 is executed against all transactions at either subclass level, all class level, or all department level, based on the system options. The "IF" column can have multiple values up to 3. Navigation to a lower level against the "IF" column is available. Organization hierarchy and rollup are available from this mining output.

To set up the process queue:

begin rse_srvc_mgr.batch_pre_process('ARM', 'TOP10');end;

TOP10 Configuration Parameters

The following parameters, listed in Table 7-3, are used for the TOP10 service. They are configured in the table W_RTL_DMS_CONFIG_G, which is populated by an ETL process and can be modified if necessary.

For assistance in setting the configuration parameters for the TOP 10 Product Affinities, refer to the Oracle Retail Analytics Association Rule Mining of Market Basket Data for Retail Analytics Market Basket Analytics White paper (My Oracle Support Note #1469143.1) for guidance on use of a configuration tool. This tool tests various configurations with customer supplied data, in order to arrive at results which satisfy the customers reporting needs.

Table 7-3 Top 10 Service Configuration Parameters

Parameter	Type	Configurable	Sample Data	Description
TOP10_MAX_DY_WID	Number	N	124020100214000	This is the last date of data that Top 10 Product Affinities mining program looks at. In most cases, it is the last date of the current week.
TOP10_MAX_SET_SIZE	NUMBER	Y	4	This is the total number of items in the Rule for Top10 Product Affinities.
TOP10_MIN_CONFIDENCE_CLS	NUMBER	Y	0.05	This is the minimum confidence filter for affinities calculated at the Class level.
TOP10_MIN_CONFIDENCE_DEPT	NUMBER	Y	0.05	This is the minimum confidence filter for affinities calculated at the Department level.
TOP10_MIN_CONFIDENCE_SBC	NUMBER	Y	0.05	This is the minimum confidence filter when Top 10 Product Affinities is configured at the Subclass level.
TOP10_MIN_DY_WID	NUMBER	N	124020100214000	This is the first date of data that Top 10 Product Affinities mining program looks at. In most cases, it is the first date of current week. This is reset to the first day of the week, that is, the number of weeks that is defined in the parameter ARM_HIST_NUM_OF_WEEK, when there is a reclassification on the product hierarchy in the current week.
TOP10_MIN_REVERSE_CONFIDENCE_CLS	NUMBER	Y		This is the minimum reverse confidence filter for affinities calculated at the Class level.
TOP10_MIN_REVERSE_CONFIDENCE_DEPT	NUMBER	Y		This is the minimum reverse confidence filter for affinities calculated at the Department level.
TOP10_MIN_REVERSE_CONFIDENCE_SBC	NUMBER	Y		This is the minimum reverse confidence filter when Top 10 Product Affinities is configured at the Subclass level.
TOP10_MIN_SUPPORT_CLS	NUMBER	Y	0.0005	This is the minimum support filter for affinities calculated at the Class level.
TOP10_MIN_SUPPORT_DEPT	NUMBER	Y	0.0005	This is the minimum support filter for affinities calculated at the Department level.
TOP10_MIN_SUPPORT_SBC	NUMBER	Y	0.0005	This is the minimum support filter when Top 10 Product Affinities is configured at subclass level.
TOP10_PROD_HIER_LEVEL	VARCHAR2	Y	DEPT	This is the highest product hierarchy level at which the mining for Top 10 Product Affinities is executed. Valid value in ('SBC', 'CLS', 'DEPT').
TOP10_RESTART_HIST_IND	VARCHAR2	N	N	Valid value in ('Y', 'N').
TOP10_ MIN_SUPPORT_TXN_CNT_DEPT	NUMBER	Y	1000	This optional configuration overrides the ARM_MIN_SUPPORT_TXN_CNT configuration and provides a lower boundary for the minimum supporting transaction count for the Department level reporting.
TOP10_ MIN_SUPPORT_TXN_CNT_CLS	NUMBER	Y	1000	This optional configuration overrides the ARM_MIN_SUPPORT_TXN_CNT configuration and provides a lower boundary for the minimum supporting transaction count for the Class level reporting.
TOP10_ MIN_SUPPORT_TXN_CNT_SBC	NUMBER	Y	1000	This optional configuration overrides the ARM_MIN_SUPPORT_TXN_CNT configuration and provides a lower boundary for the minimum supporting transaction count for the Subclass level reporting.

Anchor Subclass Top Affinities (ANC_SC)

This mining service is identified as ANC_SC. Subclasses that are used for mining are stored in the configuration table W_RTL_DMS_ATTR_LIST_G. Only subclasses that are found in this table are included in the output of this process. The "IF" column can have multiple values up to two focused subclasses and one excluded subclass. The list of excluded subclasses is limited to those that also have been found to have associations. Organization hierarchy and rollup are not available from this mining output.

To set up the process queue:

begin rse_srvc_mgr.batch_pre_process('ARM', 'ANC_SC');end;

ANC_SC Configuration Parameters

The following parameters, listed in Table 7-4, are used for the ANS_SC service. They are configured in the table W_RTL_DMS_CONFIG_G, which is populated by an ETL process and can be modified if necessary.

Table 7-4 ANS_SC Service Configuration Parameters

Parameter	Type	Configurable	Sample Data	Description
ANC_SC_ATTR_LIST_SBC	VARCHAR2	N	ANC_SC_ATTR_LIST_SBC_NUM	This is an identifier of the subclass attribute list to be processed from W_RTL_DMS_ATTR_LIST_G.
ANC_SC_IF_HIER_LEVEL	VARCHAR2	N	SBC	This is the product hierarchy level for IF item for the mining for Anchor Subclass Top Affinities.
ANC_SC_MAX_DY_WID	NUMBER	N	124020100214000	This is the last date of data that Anchor Subclass Top Affinities mining program looks at. In most cases, it is the last date of the current week.
ANC_SC_MAX_SET_SIZE	NUMBER	Y	4	This is the total number of items in the Rule for Anchor Subclass Top Affinities.
ANC_SC_MIN_CONFIDENCE	NUMBER	Y	0.05	This is the minimum confidence filter for Anchor Subclass Top Affinities.
ANC_SC_MIN_DY_WID	NUMBER	N	124020100214000	This is the first date of data that Anchor Subclass Top Affinities mining program looks at. In most cases, it is the first date of current week. This is reset to the first day of the week, that is, the number of weeks ago that is defined in the parameter ARM_HIST_NUM_OF_WEEK, when there is a reclassification on product hierarchy in the current week.
ANC_SC_MIN_REVERSE_CONFIDENCE	NUMBER	Y		This is the minimum reverse confidence filter for Anchor Subclass Top Affinities.
ANC_SC_MIN_SUPPORT	NUMBER	Y	0.0005	This is the minimum support filter for Anchor Subclass Top Affinities.
ANC_SC_RESTART_HIST_IND	VARCHAR2	N	N	Valid value in ('Y', 'N').
ANC_SC_THEN_HIER_LEVEL	VARCHAR2	Y	CLS	This is the product hierarchy level for THEN item for the mining for Anchor Subclass Top Affinities. The valid value is in ('SBC', 'CLS', 'DEPT').
ANC_SC_MIN_SUPPORT_TXN_CNT	NUMBER	Y	1000	This optional configuration overrides the ARM_MIN_SUPPORT_TXN_CNT configuration and provides a lower boundary for the minimum supporting transaction count for the Anchor Subclass Top Affinities process.

Anchor Customer Segment Promotion Affinities (CUST_PROMO)

This mining service is identified as CUST_PROMO. The customer segment list that is used for mining is stored in the configuration table W_RTL_DMS_ATTR_LIST_G. The mining is executed against only one customer segment for each mining process, and only transactions that have that customer segment are used as source data. If any transaction has a promotion, the promotion event must in the range of the number of weeks that is defined in the parameter ARM_HIST_NUM_OF_WEEK. The "IF" column can have multiple values up to 3 subclasses. Organization hierarchy and rollup are not available from this mining output.

To set up the process queue:

begin rse_srvc_mgr.batch_pre_process('ARM', 'CUST_PROMO');end;

CUST_PROMO Configuration Parameters

The following parameters, listed in Table 7-5, are used for the CUST_PROMO service. They are configured in the table W_RTL_DMS_CONFIG_G, which is populated by an ETL process, and can be modified if necessary.

Table 7-5 CUST_PROMO Service Configuration Parameters

Parameter	Type	Configurable	Sample Data	Description
CUST_PROMO_ATTR_LIST_CUST_SEG	VARCHAR2	N	CUST_PROMO_ATTR_LIST_CUST_SEG_NUM	This is an identifier of the customer segment list to be processed from W_RTL_DMS_ATTR_LIST_G.
CUST_PROMO_CUST_SEG_RECLASS_IND	VARCHAR2	Y	N	This indicates if any customer changed segment during the week. It is only manually updated by the end user. Setting this to 'Y' will cause mining program to re-execute the mining process against the whole history of mining data.
CUST_PROMO_IF_HIER_LEVEL	VARCHAR2	N	SBC	This is the product hierarchy level for IF item for the mining for Anchor Customer Segment Promotion Affinities.
CUST_PROMO_MAX_DY_WID	NUMBER	N	124020100214000	This is the last date of data that Anchor Customer Segment Promotion Affinities mining program looks at. In most cases, it is the last date of the current week.
CUST_PROMO_MAX_SET_SIZE	NUMBER	Y	3	This is the total number of items in the Rule for Anchor Customer Segment Promotion Affinities.
CUST_PROMO_MIN_CONFIDENCE	NUMBER	Y	0.05	This is the minimum confidence filter for Anchor Customer Segment Promotion Affinities.
CUST_PROMO_MIN_DY_WID	NUMBER	N	124020100214000	This is the first date of data that Anchor Customer Segment Promotion Affinities mining program looks at. In most cases, it is the first date of current week. This will be reset to the first day of the week, that is, the number of weeks ago that is defined in the parameter ARM_HIST_NUM_OF_WEEK, when there is a reclassification of the product hierarchy in the current week.
CUST_PROMO_MIN_REVERSE_CONFIDENCE	NUMBER	Y		This is the minimum reverse confidence filter for Anchor Customer Segment Promotion Affinities.
CUST_PROMO_MIN_SUPPORT	NUMBER	Y	0.0005	This is the minimum support filter for Anchor Customer Segment Promotion Affinities.
CUST_PROMO_RESTART_HIST_IND	VARCHAR2	Y	N	Valid value in ('Y', 'N').
CUST_PROMO_THEN_HIER_LEVEL	VARCHAR2	N	SBC	This is the product hierarchy level for THEN item for the mining for Anchor Customer Segment Promotion Affinities.
CUST_PROMO_MIN_SUPPORT_TXN_CNT	NUMBER	Y	1000	This optional configuration overrides the ARM_MIN_SUPPORT_TXN_CNT configuration and provides a lower boundary for the minimum supporting transaction count for the Anchor Customer Segment Promotion Affinities process.

Anchor Subclass Top Affinities by Promotions (ANC_SC_PROMO)

This mining service is identified as ANC_SC_PROMO. The promotion event list and organization list at one hierarchy level that is used for mining is stored in the configuration table W_RTL_DMS_ATTR_LIST_G. The mining is executed against only one organization for each mining process, and only transactions that have that organization are used as source data. The rollup of these results to higher organization hierarchy levels can be done at reporting time. The "IF" column can have multiple values up to three subclasses.

To set up the process queue:

begin rse_srvc_mgr.batch_pre_process('ARM', 'ANC_SC_PROMO');end;

ANC_SC_PROMO Configuration Parameters

The following parameters, listed in Table 7-6, are used for the ANC_SC_PROMO service. They are configured in the table W_RTL_DMS_CONFIG_G, which is populated by an ETL process and can be modified if necessary.

Table 7-6 ANS_SC_PROMO Service Configuration Parameters

Parameter	Type	Configurable	Sample Data	Description
ANC_SC_PROMO_ATTR_LIST_ORG_DH	VARCHAR2	N	ANC_SC_PROMO_ATTR_LIST_ORG_DH_NUM	This is an identifier of the organization list to be processed from W_RTL_DMS_ATTR_LIST_G.
ANC_SC_PROMO_ATTR_LIST_PROMO_COMP	VARCHAR2	N	ANC_SC_PROMO_ATTR_LIST_PROMO_COMP_NUM	This is an identifier of the Promotion component list to be processed from W_RTL_DMS_ATTR_LIST_G.
ANC_SC_PROMO_IF_HIER_LEVEL	VARCHAR2	N	SBC	This is the product hierarchy level for IF item for the mining for Anchor Subclass Top Affinities by Promotions.
ANC_SC_PROMO_MAX_DY_WID	NUMBER	N	124020100214000	This is the last date of data that Anchor Subclass Top Affinities by Promotions mining program looks at. In most cases, it is the last date of the current week.
ANC_SC_PROMO_MAX_SET_SIZE	NUMBER	Y	3	This is the total number of items in the Rule for Anchor Subclass Top Affinities by Promotions.
ANC_SC_PROMO_MIN_CONFIDENCE	NUMBER	Y	0.05	This is the minimum confidence filter for Anchor Subclass Top Affinities by Promotions.
ANC_SC_PROMO_MIN_DY_WID	NUMBER	N	124020100214000	This is the first date of data that Anchor Subclass Top Affinities by Promotions mining program looks at. In most cases, it is the first date of current week. This will be reset to the first day of the week, that is, the number of weeks ago which is defined in the parameter ARM_HIST_NUM_OF_WEEK, when there is a reclassification on product hierarchy in the current week.
ANC_SC_PROMO_MIN_REVERSE_CONFIDENCE	NUMBER	Y		This is the minimum reverse confidence filter for Anchor Subclass Top Affinities by Promotions.
ANC_SC_PROMO_MIN_SUPPORT	NUMBER	Y	0.0005	This is the minimum support filter for Anchor Subclass Top Affinities by Promotions.
ANC_SC_PROMO_ORG_HIER_LEVEL	NUMBER	Y	REGION	This identifies the organization hierarchy level of this mining process. The valid values are in ('LOCATION', 'DISTRICT', 'AREA', 'CHAIN', 'REGION').
ANC_SC_PROMO_RESTART_HIST_IND	VARCHAR2	N	N	Valid value in ('Y', 'N').
ANC_SC_PROMO_THEN_HIER_LEVEL	VARCHAR2	N	SBC	This is the product hierarchy level for THEN item for the mining for Anchor Subclass Top Affinities by Promotions.
ANC_SC_PROMO_MIN_SUPPORT_TXN_CNT	NUMBER	Y	1000	This optional configuration overrides the ARM_MIN_SUPPORT_TXN_CNT configuration and provides a lower boundary for the minimum supporting transaction count for the Anchor Subclass Top Affinities by Promotions process.

Top 10 Promoted Subclass Affinities (TOP10_PROMO)

This mining service is identified as ANC_SC_PROMO. The mining process is executed against all transactions. The "IF" column can have multiple values up to 3 subclasses. Organization hierarchy and rollup are not available from this mining output.

To set up the process queue:

begin rse_srvc_mgr.batch_pre_process('ARM', 'TOP10_PROMO');end;

TOP10_PROMO Configuration Parameters

The following parameters, listed in Table 7-7, are used for the TOP10_PROMO service. They are configured in the table W_RTL_DMS_CONFIG_G, which is populated by an ETL process, and can be modified if necessary.

Table 7-7 TOP10_PROMO Service Configuration Parameters

Parameter	Type	Configurable	Sample Data	Description
TOP10_PROMO_ATTR_LIST_PROMO_COMP	VARCHAR2	N	TOP10_PROMO_ATTR_LIST_PROMO_COMP_NUM	This is an identifier of the Promotion component list to be processed from W_RTL_DMS_ATTR_LIST_G.
TOP10_PROMO_IF_HIER_LEVEL	VARCHAR2	N	SBC	This is the product hierarchy level for IF item for the mining for Current Top 10 Promoted Subclass Affinities.
TOP10_PROMO_MAX_DY_WID	NUMBER	N	124020100214000	This is the last date of data that Current Top 10 Promoted Subclass Affinities mining program looks at. In most cases, it is the last date of the current week.
TOP10_PROMO_MAX_SET_SIZE	NUMBER	Y	3	This is the total number of items in the Rule for Current Top 10 Promoted Subclass Affinities.
TOP10_PROMO_MIN_CONFIDENCE	NUMBER	Y	0.05	This is the minimum confidence filter for Current Top 10 Promoted Subclass Affinities.
TOP10_PROMO_MIN_DY_WID	NUMBER	N	124020100214000	This is the first date of data that Current Top 10 Promoted Subclass Affinities mining program looks at. In most cases, it is the first date of current week. This is reset to the first day of the week that is, the number of weeks ago that is defined in the parameter ARM_HIST_NUM_OF_WEEK, when there is a reclassification on product hierarchy in the current week.
TOP10_PROMO_MIN_REVERSE_CONFIDENCE	NUMBER	Y		This is the minimum reverse confidence filter for Current Top 10 Promoted Subclass Affinities.
TOP10_PROMO_MIN_SUPPORT	NUMBER	Y	0.0005	This is the minimum support filter for Current Top 10 Promoted Subclass Affinities.
TOP10_PROMO_RESTART_HIST_IND	VARCHAR2	N	N	Valid value in ('Y', 'N').
TOP10_PROMO_THEN_HIER_LEVEL	VARCHAR2	N	SBC	This is the product hierarchy level for THEN item for the mining for Current Top 10 Promoted Subclass Affinities.

Running ARM Services

The ARM Services have to be set up in the queue in the table RSE_SRVC_REQ_QUEUE the ARM Services can be run. The loading of the ARM configuration parameters and the execution of the various RSE_SRVC_MGR.BATCH_PRE_PROCESS steps as defined in the above sections should all be completed before processing the process queue, as shown below.

To run the process queue:

begin rse_srvc_mgr.process_queue('ARM');end;

ARM Services Objects Reference

Table 7-8 lists the input tables, output tables, and object types associated with ARM Services. Object types are user-defined extendable types, composed of two parts, attributes and methods, which enable encapsulation and code reuse during the various MBA operations. Refer to the Oracle Database Application Developer's Guide.

Table 7-8 ARM Services Objects

Service Name	Object Type	Major Source Tables (input)	Staging Tables (output)
TOP10	MBA_ARM_RTL_TOP10_T	W_RTL_SLS_TRX_IT_LC_DY_F	W_RTL_DMO_SLS_SC_WK_G, W_RTL_DMO_SLS_CL_WK_G, W_RTL_DMO_SLS_DP_WK_G
ANC_SC	MBA_ARM_RTL_ANC_SC_T	W_RTL_SLS_TRX_IT_LC_DY_F	W_RTL_DMO_SLS_SC_WK_SUP_G
CUST_PROMO	MBA_ARM_RTL_ANC_CUST_PROMO_T	W_RTL_SLS_TRX_IT_LC_DY_F	W_RTL_DMO_SLSPR_CUST_SC_WK_G
ANC_SC_PROMO	MBA_ARM_RTL_ANC_SC_PROMO_T	W_RTL_SLS_TRX_IT_LC_DY_F	W_RTL_DMO_SLSPR_SC_WK_SUP_G
TOP10_PROMO	MBA_ARM_RTL_TOP10_PROMO_T	W_RTL_SLS_TRX_IT_LC_DY_F	W_RTL_DMO_SLSPR_SC_WK_G

ARM Services Objects Types Hierarchy

Figure 7-4 shows the hierarchy of ARM Services object types.

Figure 7-4 Market Basket Analysis ARM Services Object Types Hierarchy

Attribute List Table: w_rtl_dms_attr_list_g

Table 7-9 contains the name and value of each attribute processed for the data mining process. The names of the attribute lists are defined in the table W_RTL_DMS_CONFIG_G.

Table 7-9 Data Mining Process Attributes

Attribute Description	Attribute Name (from W_RTL_DMS_CONFIG_G)	Attribute Value
Anchor Subclass: List of subclasses that are used for supervised non-promotion related product affinity.	Value defined for ANC_SC_ATTR_LIST_SBC parameter.	Mining program needs to join the W_RTL_DMS_PRODUCT_G to get attribute value.
Promotional Component: The list of promotions that are used for promotion-related supervised mining.	Value defined for TOP10_PROMO_ATTR_LIST_PROMO_COMP parameter.	Mining program needs to join the promotion table W_RTL_PROM_D to get attribute value.
Customer Segment: The list of customer segments.	Value defined for CUST_PROMO_ATTR_LIST_CUST_SEG parameter.	Mining program needs to join customer table to get relationship between customer and customer segment RSE_CUR_CUSTSEG_D.
Organization: The list of organization hierarchy values that are used for mining. The hierarchy level is defined in W_RTL_DMS_CONFIG_G table where PARA_NAME = 'MINING_ORG_LEVEL_PROMO_SUP.	Value defined for ANC_SC_PROMO_ATTR_LIST_ORG_DH parameter.	Mining program needs to join the W_RTL_DMS_ORG_DH_G and W_RTL_DMS_INT_ORG_DH_G to get value.
Anchor Subclass Promotion component: List of subclasses that are used for supervised promotion- related product affinity.	Value defined for ANC_SC_PROMO_ATTR_LIST_PROMO_COMP parameter.	Mining program needs to join the promotion table W_RTL_PROM_D to get value.

Table 7-10 lists the sample data for the attributes:

Table 7-10 Attribute Sample Data

Attribute Name	Attribute Value
ANC_SC_ATTR_LIST_SBC_NUM	57~26~5425
TOP10_PROMO_ATTR_LIST_PROMO_COMP_NUM	652663
CUST_PROMO_ATTR_LIST_CUST_SEG_NUM	FreqHigh
ANC_SC_PROMO_ATTR_LIST_ORG_DH_NUM	A4
ANC_SC_PROMO_ATTR_LIST_PROMO_COMP_NUM	652808

Baseline Services

The Baseline services process sales transaction from Retail Analytics into a suitable structure and calculates non-promoted baseline sales for items that are promoted.

Baseline is a process of calculating non-promoted baseline sales for items that are promoted. It transfers sales data by week from Retail Analytics tables, identifying the appropriate weeks that are suitable to be included in the baseline calculation.

For instance, item/location weekly sales are suitable for baseline calculation only if they does not have promotion sales for the week. In the case of overlapping promotions in a given week due to weekly sales by promotion transfer from Retail Analytics, baseline values split proportionally across the overlapping promotions using sales results from each promotion. Overlapping in promotions whenever the same item/location appears on more than one promotion for the same day within the week are also considered. In case of promotions that could run on the same week but different days, the baseline is adjusted only based on the number of days each promotion ran within the week.

Once the data is ready for calculation, the process performs the baseline calculation for all the promotions that have completed. Baseline results by promotion are saved to the baseline staging table W_RTL_DMO_SLS_BL_PC_IT_LC_WK_G. Upon completion of baseline processing, control is returned to Retail Analytics in order to perform data integration to the final target table W_RTL_SLSPR_BL_PC_IT_LC_WK_A.

There are two services associated with Baseline.

Gathering weekly sales data (GET_WK_SLS_DATA)
Calculating Baseline (CALC_PROMO_BL)

Since the Calculating Baseline (CALC_PROMO_BL Service) depends on the data that are gathered (GET_WK_SLS_DATA Service), the services should be executed in the specified order shown in the list.

Figure 7-5 shows the MBA Baseline Process flow diagram.

Figure 7-5 Market Basket Analysis MBA Baseline Process Flow

Running Promotion Baseline Service

Here is the method for the data gathering process of relevant sales data into MBA. This process must be completed successfully before you execute the CALC_PROMO_BL step.

To set up the process queue for gathering Baseline Weekly Sales Data:

begin rse_srvc_mgr.batch_pre_process('BL', 'GET_WK_SLS_DATA');end;

To process the service queue:

begin rse_srvc_mgr.process_queue('BL');end;

Upon successful completion of this process, the following can be run to calculate the Promotion Baseline data and publish it to the staging table.

To set up the process queue for Baseline calculation:

begin rse_srvc_mgr.batch_pre_process('BL', 'CALC_PROMO_BL');end;

To process the queue that was set for Baseline Calculation:

begin rse_srvc_mgr.process_queue('BL');end;

Baseline Configuration

Table 7-11 lists the configuration values that must be defined within the W_RTL_DMS_CONFIG_G table for the baseline processing. They are configured during initial configuration and are loaded by the ETL, and can be modified if necessary.

Table 7-11 Baseline Configuration Values

Parameter	Type	Required	Sample Data	Description
BL_WEEKS_TO_USE	NUMBER	Y	16	Number of weeks to use for baseline calculation. Number of sales data weeks to use before the promotion start is determined as CEIL (BL_WEEKS_TO_USE/2) and number of sales data weeks to use after the promotion ends is determined as TRUNC (BL_WEEKS_TO_USE/2).
BL_POST_PROMO_CALC_FREQ	NUMBER	Y	2	Baseline calculation always occur the week(s) the promotion is active, the week after the promotion completes, and at the end of the after promotion weeks. This parameter indicates the recalculation frequency for the weeks in the middle (end of promotion to end of post promotion weeks). That is, a value of 1 indicates a recalculation every week, a value of 2 indicates a recalculation every 2 weeks, and so on.
BL_WEEK_TO_PROCESS	NUMBER	Y	124020100003	Indicates "the" baseline processing week used to calculate the baseline using sales data around this week.
BL_WEIGHT_AGE_BASE	NUMBER	Y	0.5	This configuration value is used within the baseline calculation on this expression, where "a" corresponds to BL_WEIGHT_AGE_BASE, with a default value of 0.5, and the exponent "n" corresponds to the number of weeks between the promotional week and the adjacent weeks. The adjacent weeks are capped to the PRE and POST promotion number.
BL_WK_SALES_DATA_DOP	NUMBER	N	4	Number of recommended parallel threads to use for weekly baseline calculation. Weekly baseline calculation retrieves sales data from Retail Analytics for multiple weeks. This step can be parallelized by the use of this parameter.
BL_PROMO_CALC_DOP	NUMBER	N	4	Number of concurrent threads to use while doing promotion baseline calculation. This service can be parallelized by the use of this parameter.
BL_PRE_PROMO_WEEKS	NUMBER	N	5	This parameter can be used to specifically indicate the number of sales weeks prior to the promotions that should be used for baseline calculation. This parameter takes precedence over any other value used to identify the number of PRE/POST promo weeks.
BL_POST_PROMO_WEEKS	NUMBER	N	9	This parameter can be used to specifically indicate the number of sales weeks after the promotions that should be used for baseline calculation. This parameter takes precedence over any other value used to identify the number of PRE/POST promo weeks.
BL_REFRESH_PRE_PWEEKS	NUMBER	N	2	Baseline calculation extracts sales data from Retail Analytics and preserves the aggregated data within MBA for future use. The data can miss back posted and return transactions that might be relevant for future calculations. This parameter indicates the number of the most recent weeks prior to the processing week that needs to be refreshed. (For example, a value of 2 indicates the baseline process that the two most recent weeks need to be refreshed from Retail Analytics, a value of 0 indicates that historical data already available on MBA schema must be used, effectively ignoring returns and back posted data, and so on).

Parallelism

MBA provides two levels of parallelism in accessing the services with the help of the table RSE_SRVC_REQ_QUEUE table, the object type RSE_SRVC_T, and the RSE_SRVC_MGR package with the help of dbms_parallel_execute. The first level allows the processing of different Service Request Groups in parallel threads and the second level allows parallel processing of the different service requests within a given Service Request Group.

Parallel processing depends on how the parameters associated with parallel processing are configured in the W_RTL_DMS_ATTR_LIST_G table. For ARM Services, it depends on the degree of ARM data mining batches that can be simultaneously processed within each week (ARM_BATCH_DOP) and degree of weeks that can be simultaneously processed for ARM data mining (ARM_WEEKLY_DOP), whereas for Baseline, it depends on the number of parallel threads for sales data retrieval (BL_WK_SALES_DATA_DOP), for GET_WK_SLS_DATA service, and the number of parallel threads for calculating Baseline (BL_PROMO_CALC_DOP) for CALC_PROMO_BL (Refer Baseline Configuration).

The maximum number of parallel threads that can be used depends on the particular server and database configurations. Ultimately, the limit on the number of parallel threads that can be used is controlled by the database configuration for the maximum number of jobs allowed (job_queue_processes).

Enabling parallel processing of ARM at both levels should only be done in cases where there is a sufficiently large database server with sufficient IO capabilities. For example, if you specify two parallel weeks (ARM_BATCH_DOP =2)and three parallel batches (ARM_WEEKLY_DOP=3), then you need the support to run eight concurrent jobs (2 weeks + (2 weeks * 3 batches per week)) in the database. In addition, if parallel processing for Baseline is enabled, then support for that many threads of processing is also required. As Baseline service are serial (unlike ARM), the configuration of four (BL_WK_SALES_DATA_DOP=4 and BL_PROMO_CALC_DOP=4) would require a maximum of four concurrent jobs, and it would not compound like it does for ARM. In this specific case, if Baseline runs simultaneously with ARM, the database server needs to accommodate both needs, that is, for 12 (8 ARM + 4 Baseline) concurrent jobs.

Maintenance

This section contains information about MBA maintenance.

Purging ODM Models: purge_odm_models

This routine purges the ODM model and its supporting DB tables, which helps to remove any models that may have been left over from a failed process that was never fully recovered. To remove all leftover models related to the MBA processing, purge_odm_models can be invoked as shown below to remove all models that follow the specified naming pattern. The list of models eligible for purging can be obtained by querying the data dictionary view USER_MINING_MODELS in the MBA schema.

Table 7-12 Purging ODM Models

Input Parameter	Type	Description
p_odm_model_name	VARCHAR2	Name corresponding to the model to be removed. Optional parameter.

Example: To purge a ODM model named like 'TMP$ARM_BATCH'

begin rse_srvc_maint.purge_odm_models('TMP$ARM_BATCH%');end;

Market Basket Analysis Terms

This section defines some relevant terms in Table 7-13.

Table 7-13 MBA Terminology

Term	Definition
ARM	Association Rule Mining
DM	Data Mining
ODM	Oracle Data Mining algorithm
Antecedent	A set of item(s) that are frequently associated with another item. This is also known as the IF part of a Rule, or the Left Hand Side (LHS) of a rule.
Consequent	An item that has been found to be associated with another set of item(s). This is also known as the THEN part of a Rule, or the Right Hand Side (RHS) of a rule.
Support	The number of occurrences that the corresponding item(s) were found in a transaction.
Confidence	Expresses how likely the Consequent will be found in transactions which contain the Antecedent.
Lift	How many times more often the Antecedent and Consequent occur together then they would have if they were statistically independent.
BL	Baseline
ETL	Extraction, Transformation and Loading
DOP	Degree of Parallelism