The Cost-Based Query Optimizer
This section contains:
- The Query Plan
- Evaluate Information for Each Table
- Determining the Query Plan
- Optimizing Subqueries
- Query Plans for Sub-Queries
- Sub-Query Flattening
- Skip Duplicate Index Scan
- First Row/Semi Join Example
- First Row/Semi Join
- Predict Promotion Across Control Blocks
- Predict Promotion
- Viewing the Query Plan
- The Explain File
- Sequential Scan With Temporary Table
- Sequential Scan With Filter
- Key-Only Index Scan
- Index Scan With Lower Index Filter
- Index Scan: Lower and Upper Index Filters
- Dynamic Hash Join
- Hash Join: Parallel Scan and Sort Threads
- Key-First Index Scans
- Current SQL Information
- Current SET EXPLAIN Information
- Influencing the Optimizer
- OPTCOMPIND
- SET OPTIMIZATION
- Optimization LOW
- When To Try OPTIMIZATION LOW
- FIRST_ROWS
- What Makes IDS Directives Better?
- Types of Optimizer Directives
- Directives
- Directive Examples
- Access Method Directives
- Join Order Directive
- Join Method Directive
- Optimization Goal Directives
- EXPLAIN Directive
- Tips for Using Directives
The Query Plan
The query plan is the combination of the access plan(s) and join plan(s) that the optimizer chooses.

An optimal query plan will:
- Reduce the number of pages read
- Eliminate unnecessary sorts
For all database servers, a query plan is the road map that the optimizer chooses for retrieving the requested data. A good query plan allows the IDS engine to read the fewest pages possible, and eliminates sorts for ORDER BY and GROUP BY clauses by reading data in the order required.
Much like Oracle, IDS may choose from various methods for retrieving data from disk. The access methods used by IDS include:
- Sequential scan
This is equivalent to an Oracle table access full scan. The pages (blocks) containing rows for the table are read in physical order.
- Index scan
Oracle describes this as an index scan with table access by rowid. The server reads the index pages, applying filters where possible, and uses the ROWID values stored in the index to retrieve qualifying rows.
- Key-only index scan
When the data required to satisfy the query is contained within the index, IBM Informix retrieves the data requested from the index, eliminating the need to read the associated data pages. This behavior may be simulated in Oracle by setting the configurationparameter, FAST_ FULL_ SCAN_ ENABLED to TRUE, and using the INDEX_FFS hint to force an Oracle FAST FULL SCAN. (Oracle’s FAST FULL SCAN enforces a number of restrictions that do not apply to IBM Informix key-only index scans.)
- Key-first index scan
A key-first index scan uses index key filters, in addition to lower and upper filters to reduce the number of rows that must be read by a query.
- Auto-index scan
The auto-index scan is an IBM Informix exclusive feature, that allows the server to automatically build a temporary index on one or more columns used as query filters. The server will then read this temporary index to locate the data required. The index is only available for the query.
The auto-index feature of the IBM Informix Dynamic Server can be extremely beneficial to some OLTP batch activities. It allows the query to benefit from the index, but since the index is not maintained, extra writes are not required by later inserts, updates, and deletes from the table. In very active OLTP environments, the overhead of modifying indexes when a table has rows inserted, updated, or deleted can be significant. The two basic strategies that the IDS optimizer will choose for joining two tables are the nested loop join and the hash join
Evaluate Information for Each Table
- Examine selectivity of every filter
- Determine if indexes can be used for:
- Filters
- ORDERS BY or GROUP BY
- Find the best way to scan a table
Since query costs are largely determined by the number of rows that must be read from each table, the optimizer begins building its query plan by considering the conditional expressions in the WHERE clause to determine how many rows will qualify. These conditional expressions are often referred to as filters.
Each filter is examined to determine its selectivity. Selectivity is a number between 0 and 1 that indicates the fraction of rows the optimizer estimates will have to be read. If all rows meet the filter, the selectivity would be 1. A very selective filter will have a selectivity near 0. To determine the selectivity for a filter, the optimizer analyzes the data distributions for the column. If distributions are not available, the optimizer must apply a function to calculate the anticipated selectivity.
Next, the optimizer determines whether an existing index may be used to facilitate retrieval based on a filter condition, ORDER BY, or GROUP BY clause. Finally, the optimizer decides whether the table will be scanned sequentially or through an index.
FILTER SELECTIVITY ASSIGNMENTS

KEY:
indexed-col: first or only column in an index
2nd-max, 2nd-min: second largest and second smallest key values in
indexed column.
any-col: any column not covered by a preceding formula
To construct a query plan, the optimizer applies a bottom up, breadth first search algorithm. First, the optimizer considers the possible access path for each table, and the possible join methods for each pair of tables. Next, the optimizer may eliminate the more expensive of any two equivalent join pairs. For example, the table join (A x B) is equivalent to (B x A) if there are no ORDER BY or GROUP BY conditions on the columns of either table.
If the query joins three or more tables, this process is continued. The costs of a join are considered for each remaining join pair with each remaining table. Equivalent paths are eliminated. Finally, the optimizer selects the remaining join sequence with the lowest estimated cost.
Determining the Query Plan
The IDS optimizer first constructs all the possible join pairs by applying a bottom up, breadth first search.
Next, the optimizer:
- Considers the I/O and CPU costs associates with each access path and join pair, by evaluating:
- Table information
- Index information
- Distribution data, if available
- Eliminates the more expensive of any two equivalent join pairs
- Selects the lowest cost of the remaining join pairs
To construct a query plan, the optimizer applies a bottom up, breadth first search algorithm.
First, the optimizer considers the possible access path for each table, and the possible join methods for each pair of tables. Next, the optimizer may eliminate the more expensive of any two equivalent join pairs. For example, the table join (A x B) is equivalent to (B x A) if there are no ORDER BY or GROUP BY conditions on the columns of either table.
If the query joins three or more tables, this process is continued. The costs of a join are considered for each remaining join pair with each remaining table. Equivalent paths are eliminated.Finally, the optimizer selects the remaining join sequence with the lowest estimated cost
Optimizing Subqueries
# SQL statement with sub-query
select * from customer
where customer_num in
(select customer_num
from orders where order_date = TODAY);
# correlated subquery
select * from customer c
where exists (
select customer_num from orders where
orders.customer_num =
customer.customer_num and order_date =
TODAY );
A subquery is a SELECT statement that is contained in the WHERE clause of a SELECT, INSERT, UPDATE, or DELETE statement.A special type of subquery is a correlated subquery. A correlated subquery receives a value from the outer statement. Because it relies on a value passed from the outer statement, the inner statement must be executed once for every value retrieved by the outer SQL statement.
As you can see from the examples above, most subqueries can also be written as join statements. For example, the query:
select customer.* from customer, orders where customer.customer_num =
orders.customer_num and orders.order_date = TODAY;
retrieves the same data as the two examples above.
Query Plans for Sub-Queries
- Subquery flattening
- Optimizer converts subquery into a join
Special access plans:
- Skip duplicate index scan
- Return only unique entries from index scan
- First row scan
- Scan only until first match is found
- Semi-join
- Perform a nested loop join, with a first row scan on the inner table
To improve performance for subqueries, which are used extensively in some third party applications, IDS version 7.3 offers subquery flattening. Subquery flattening is simply a term for optimizing a SELECT statement containing a subquery by converting the subquery into a join.
When a subquery is transformed into a join, the optimizer may use one of three specialized access and join plans:
- Skip-duplicate-index scan
- An index scan is performed, but only unique key values are returned. If a duplicate key value has many associate row ids, successive row id values are ignored.
- First-row scan
- The table is scanned only until the first match is found.
- Semi-join
- A nested loop join is performed but the inner table is only scanned until the first match is found (first row scan on inner table).
The specialized access and join plans can deliver order of magnitude improvements over traditional subquery performance.
Sub-Query Flattening

In the example above, IBM Informix converts the subquery into a nest-loop join.Using traditional subquery optimization methods, the inner select would be performed once for every row retrieved by the outer query. If the orders table has 500,000 rows, the inner query is executed 500,000 times. Using subquery flattening, the query is transformed into a nested loop join between the customer and the orders tables. Each table is read (either sequentially or through an index) only one time.
Skip Duplicate Index Scan
QUERY:
------
update customer set cust_status = "C"
where exists
(select customer_num from orders where customer.customer_num =
orders.customer_num )
Estimated Cost: 10
Estimated # of Rows Returned: 23
- Administ.orders: INDEX PATH (Skip Duplicate)
- Index Keys: customer_num (Key-Only)
- Administ.customer: INDEX PATH
- Index Keys: customer_num
Lower Index Filter:
Administ.customer.customer_num= Administ.orders.customer_num
NESTED LOOP JOIN
The skip duplicate index scan prevents multiple index lookups on a secondary table for the same value. Consider the query shown in the slide. It is possible that multiple order records will exist for the same customer. This could result in repeated searches into the orders table for the same value and repeated updates to the same row.
The skip duplicate index scan ensures that only unique order_num values are returned, eliminating repeat scans and updates for a single row. The combination of a key-only scan and skip duplicate scan may result in a significant reduction in I/O.
First Row/Semi Join Example
Consider this example of a correlated sub-query:
update manufact set lead_time = '3'
where manu_name = “Smith”
and exists (
select 'x'
from stock
where stock.unit_price > 250.00
and manufact.manu_code = stock.manu_code)
Consider the query shown in the slide.
First Row/Semi Join
update manufact set lead_time = '3'
where manu_name = "Smith"
and exists (
select 'x'
from stock
where stock.unit_price > 250.00
and manufact.manu_code = stock.manu_code)
Estimated Cost: 4
Estimated # of Rows Returned: 2
- manufact: SEQUENTIAL SCAN Filters: Administ.manufact.manu_name = 'Smith'
- Administ.stock: INDEX PATH (First Row) Filters:
- Administ.stock.unit_price > $250.00
- Index Keys: manu_code Lower Index Filter: Administ.stock.manu_code=
Administ.manufact.manu_code
NESTED LOOP JOIN (Semi Join)
In this example, the optimizer will first perform a sequential scan of the outer table, customer. Using traditional subquery methods, the server would execute the inner query for each row retrieved by the sequential scan. Applying subquery flattening and the first row/semi join optimization, IDS can simply join the each row in the outer query to the first matching row returned by the inner query. Because not every qualifying row is read or joined, the query plan is referred to as a first row/semi join. The first row/semi join optimization has the potential to significantly reduce the number of reads required to execute the subquery.
Predict Promotion Across Control Blocks
select * from ps_jrnl_ln
where business_unit = 'ABC'
and process_instance = 5960
and not exists
( select "X"
from ps_bus_unit_tbl_gl P
where P.business_unit = ps_jrnl_ln.business_unit)
select * from ps_jrnl_ln
where business_unit = 'ABC'
and process_instance = 5960
and not exists
(select "X"
from ps_bus_unit_tbl_gl P
where P.business_unit = 'ABC')
Another enhancement provided for subquery optimization is predicate promotion across control blocks. Consider the correlated subquery shown in the slide. This query could easily be rewritten as a non-correlated subquery, as shown in the example.Sometimes, however, we do not have the option of rewriting the query because it is generated by a software package.IDS addresses these situations by employing the optimizer’s predicate promotion feature.
Predict Promotion
- informix.ps_jrnl_ln: INDEX PATH Filters: NOT EXISTS <subquery>
- Index Keys: process_instance
business_unit (Serial, fragments: ALL)
Lower Index Filter:
(informix.ps_jrnl_ln.business_unit = 'MFG' AND
informix.ps_jrnl_ln.process_instance = 1990 )
Subquery:
---------
- informix.p: INDEX PATH
- Index Keys: business_unit (Key-Only)
(Serial, fragments: ALL)
Lower Index Filter:
informix.p.business_unit = 'MFG'
Predicate promotion across control blocks simply means that, whenever possible, constant values are substituted in correlated subqueries so that the query, in effect, becomes a noncorrelated subquery. The query plan shown in the slide above, shows how the optimizer replaces the correlated join on ps_ jrnl_ ln.business_unit and ps_ bus_ unit_ tbl_gl.business_unitwith a filter on the literal or constant value, “ABC”. One customer benchmark of this type query demonstrated that this feature provided an order of magnitude improvement over traditional correlated subquery join performance.
Viewing the Query Plan

In Oracle, you have likely used the EXPLAIN PLAN statement to capture output describing decisions made by the optimizer. The IBM Informix counterpart is the SET EXPLAIN command. When you enable explain output by executing the SET EXPLAIN ON command, the server will produce a text file that contains:
- An estimate cost for the query plan
- The order in which the tables are accessed
- If temporary tables are needed to process the query
- The access method for each table
- The join method for each join pair.
The Explain File

On UNIX systems, the text file created is named sqexplain.out. If the client application issuing the SET EXPLAIN ON statement is located on the same computer as the database instance, the sqexplain.out file is created in the current working directory. If the client application is located a different computer, the sqexplain.out file is created in the user’s home directory on the dbms server machine.
NT
On NT systems, the output of the SET EXPLAIN ON statement is written to
%INFORMIXDIR%sqexplnusername.out.
Sequential Scan With Temporary Table
QUERY:
select * from stock order by description;
Estimated cost: 20
Estimated # of Rows Returned 74
Temporary Files Required For Order by
- client.stock: SEQUENTIAL SCAN
Even the simplest query is optimized in order to find the best access strategy. When a query path is chosen before a query is run, the statistics kept by the optimizer are put in the output file.
Estimated Cost
The estimated cost of each query is printed out. In the example above, the estimated cost is 20.
The units are not important, as this value is only used in comparison to other possible paths. It is important to understand this: the estimated cost is in no way useful for determining either how long the query will take or what the cost in resources will be. Its sole value is for comparison with alternative paths.
Estimated Rows Returned
The estimated number of rows to be returned is also printed out. Again, this is only an estimate, but does often come reasonably close to the actual number of rows returned. This estimate is most accurate when all filter and join conditions are associated with indexes, and when the statistics for the tables involved in the query are up to date.
Temporary File
It will also be reported when a temporary table or file is created for the query; the reason for the temporary file or table is given in this report. In the example above, we see that a sort was required to process the ORDER BY clause. The sort requires space to hold intermediate files. No temporary file is created if an index can be used to order the tuples. Only the selected path is reported via the SET EXPLAIN command; you cannot find out what alternate paths were considered.
Table Access Strategy
The access strategy for each table in the query is shown. In the example on the slide, the table will be accessed via a SEQUENTIAL SCAN, wherein the entire table is read from beginning to end. IBM Informix’s "SEQUENTIAL SCAN" access method is equivalent to Oracle’s "table access full" scan. When the query includes a WHERE clause, the expressions in that clause can either specify join conditions (indicating the columns that join the tables in the query) or filter conditions.
Sequential Scan With Filter
QUERY:
select * from stock where unit_price > 20;
Estimated cost: 5
Estimated # of Rows Returned: 25
- client.stock: SEQUENTIAL SCAN Filters: client.stock.unit_price > 20;
Filter
A filter condition is an expression that specifies a condition to be placed on the row to satisfy the query. When the optimizer chooses to access a table via a SEQUENTIAL SCAN and there are filter conditions placed on columns in that table, the filter is listed in the Filters section of the output. These are conditions that are placed on the rows after they have been read from the database but before they are returned to the user's program.
In the example above, all the rows in the table are read during the SEQUENTIAL SCAN. Before any row is returned to the user's program, however, the filter condition stock.unit_price > 20 is applied to the row. If it meets this condition, the row will be returned to the user; if not, the row is not considered for any further processing
Key-Only Index Scan
QUERY:
select max(order_num) from orders;
Estimated cost: 1
Estimated # Rows Returned: 1
- client.orders: INDEX PATH
- Index Keys: order_num (Key-Only)
(Aggregate)
When a query can take advantage of an index on one or more of the tables, the optimizer will choose to use one or more of these indexes to retrieve rows from the table.
Index Path
This type of access is known as an INDEX PATH. Generally, an INDEX PATH is the fastest access method, as it means you only have to look at rows that satisfy one or more of your filter conditions. The fewer rows that have to be read, the faster the query will run. When the optimizer chooses to use an index to access a table, it will print out the keys used for each index. This information is printed out as Index Keys for the table.
Key-Only Select
In some cases, all the data you want to retrieve from a table is contained in the index. In these situations, reading the data pages is unnecessary. In such a case, IDS will perform a key only index scan. A key only scan allows the server to read only the index pages. The I/O and CPU overhead associated with an unnecessary read of data pages is completely eliminated.
Index Scan With Lower Index Filter
QUERY:
select * from stock, items
where stock.stock_num = items.stock_num
and items.quantity >1;
Estimated cost: 14
Estimated # Rows Returned: 51
- client.items: SEQUENTIAL SCAN Filters: client.items.quantity > 1
- client.stock: INDEX PATH
- Index Keys: stock_num,manu_code
Lower Index Filter: (client.stock.stock_num =
client.items.stock_num)
When a query accesses several tables, the explain output will list the tables in the order in which they will be accessed.
Index Read Start and Stop Points
When an indexed search of the tables is performed, there is generally one of two conditions that define the indexed search: the start point and the stop point.
Lower Index Filter
When performing an indexed read, it is first necessary to position within the index to find the first key value. Once this position is found, the index can be read sequentially until the key value no longer meets the condition set. The condition that defines where to initially position in the index is called a Lower Index Filter. The explain output will include the Lower Index Filter for each index used, when appropriate.
Nested Loop Join
In the example above, the condition stock.stock_num = items.stock_num is used to start the index search on the stock table. For each items.stock_num value retrieved, an index search of the stock table will be performed using that as a key value (nested loop join). In some cases, it will be necessary to position in the index at the very beginning, that is, at the very first key in the index, then search the index in order up to a particular point, called the stop point.
Index Scan: Lower and Upper Index Filters
Query:
select * from customer
where customer_num between 104 and 111;
Estimated: 2
Estimated # Row Returned: 7
- client.customer INDEX PATH
- Index Keys: customer_num
Lower Index Filter: (client.customer.customer_num
>= 104)
Upper Index Filter:
(client.customer.customer_num <= 111)
Upper Index Filter
The condition that defines this stop point is known as an Upper Index Filter. When an index being used has a stop point associated with it, the SET EXPLAIN output will show that condition as an Upper Index Filter.
Some queries will have both Lower and Upper Index Filters for an indexed read. In such a case, one condition defines where to start the index search, and the other, where to stop the search. A typical example of a query that would use both Upper and Lower index filters is a query with a BETWEEN clause on an indexed column. Such a query is shown in the example above.
Dynamic Hash Join
Query:
select * from items, stock
where items.total_price = stock.unit_price;
Estimated cost: 11
Estimated # Rows Returned: 75
- informix.items: SEQUENTIAL SCAN
- informix.stock: SEQUENTIAL SCAN
DYNAMIC HASH JOIN (Build Outer)
Dynamic Hash Filters: informix.items.total_price =
informix.stock.unit_price
The above example shows a partial SET EXPLAIN output. The DYNAMIC HASH JOIN keywords indicate that a hash table will be built on one table and a hash join will be performed. It includes the filter that will be used for the join. By default, the hash table is built on the second table listed in the SET EXPLAIN output. If the term Build Outer is listed, the hash table is built on the first table listed.
Hash Join: Parallel Scan and Sort Threads
Query:
select sales_cd, prod_cd, manufact, company from
product, sales
where sales_cd in (’new’) and product.prod_cd =
sales.prod_cd
group by 2,3,1;
Estimated cost: 1088601
Estimated # Rows Returned: 7
- informix.product: SEQUENTIALSCAN (Parallel, fragments: ALL)
- informix.sales: SEQUENTIALSCAN filters: informix.sales.sales_cd IN(’new’)
DYNAMIC HASH JOIN
Dynamic Hash Filters: informix.product.prod_cd =
informix.sales.prod_cd
# of Secondary Threads:12
Particularly in DSS environments, the optimizer no longer assumes that index lookups are better than a table scan.
Dynamic Hash Join
Typically in DSS environments, large amounts of data are read and full table scans are required.
Hash joins can provide significant performance advantages over the other join methods, especially where the size of the join tables are very large. The DYNAMIC HASH JOIN keywords indicate that a hash join will be used. The output will further display what tables and filters will be used in the hash join.
Sequential Scan (Parallel, Fragments: All)
The SET EXPLAIN output indicates if a sequential scan of a fragmented table will be performed in parallel, and the number of fragments that will be read by the (Parallel, fragments: ALL) keywords. The ability to read table fragments in parallel can greatly increase query performance.
Key-First Index Scans
QUERY:
------
select * from customer where (customer_num > 120) and (
(customer_num = 122) or (customer_num = 123))
Estimated Cost: 1
Estimated # of Rows Returned: 1
- informix.customer: INDEX PATH
- Index Keys: customer_num (Key-First)
Lower Index Filter: informix.customer.customer_num
> 120
Key-First Filters: ((informix.customer.customer_num = 122
OR informix.customer.customer_num = 123 ) )
A key-first index scan uses other key filters, in addition to lower and upper filters, to reduce the number of rows read by a query. Using traditional index scan methods, an optimizer might apply the lower index filter of 120 then read all index keys and corresponding data pages with values greater than 120. Only after reading the data into memory, would the server filter out other non-qualifying rows (customer_num not equal to 122 or 123) before sending the data to the client.
By applying a key-first index scan, the optimizer applies all filter conditions (customer greater than 120 and equal to 122 or 123) prior to accessing the data pages. If a unique index is built on the customer_num column, the key-first index scan will eliminate all but two ROWIDs, those associated with customer_num values 122 and 123, and ensure that at most two data pages are read to resolve the query.
Current SQL Information
$ onstat -g sql
Informix Dynamic Server Version 7.30.TC3 -- On-Line --
Sess SQL Current Iso Lock SQL ISAM F.E.
Id Stmttype Database Lvl Mode ERR ERR Vers
34343 SELECT new_psoft NL NotWait 0 0 9.20
33022 SELECT stores7 CR NotWait 0 0 9.20
$ onstat -g sql 33022
Informix Dynamic Server Version 7.30.TC3 -- On-Line --
Sess SQL Current Iso Lock SQL ISAM F.E.
Id Stmt type Database Lvl Mode ERR ERR Vers
33022 SELECT stores7 CR Not Wait 0 0 9.20
Current statement name : _ixc7b
Current SQL statement :
select * from sysmaster:syssqexplain
The onstat -g sql command includes summary information about the last <k keyword>SQL statement executed by each session. The fields included in onstat -g sql are:
- Session Id - The session id of the user executing the SQL statement.
- Statement type - The statement type such as SELECT, UPDATE, DELETE, INSERT.
- Current Database - The name of the current database for the session.
- Isolation level - The current isolation level
- Lock mode - The current lock mode
- SQL ERR - The last SQL error.
- ISAM ERR - The last ISAM error.
- F.E. Vers - The IBM Informix version of the client application.
To retrieve more detailed information about the a specific session, use
onstat -g sql session_id
You may also retrieve a listing of active sql sessions by executing the command:
select * from sysmaster:syssqlcurses;
or
select * from sysmaster:syssqlcurses
where scs_session = session_id;
Current SET EXPLAIN Information
To retrieve explain output using SQL execute:
SELECT * FROM sysmaster:syssqlexplain WHERE sqx_session
= session_id;
As a former Oracle DBA, you may be more accustomed to using the plan_table and SQL to store and retrieve explain plan information. To access IBM Informix explain output using SQL, query the syssqexplain table in the sysmaster database. To retrieve information for a specific session, be sure to filter on the sqx_session column, which stores the session id.
Influencing the Optimizer
- OPTCOMPIND
- Configuration parameter
- Environment variable
- SET OPTIMIZATION
- High/Low
- All Rows/First Rows
- OPTIMIZER DIRECTIVES
The IDS optimizer is a very sophisticated tool and will generally choose the most efficient query plan available. However, a few tools are also available for influencing the optimizer choices when appropriate.
OPTCOMPIND

The OPTCOMPIND configuration parameter allows you to influence the optimizers behavior for all queries executing against the server. You can override this global behavior by setting the OPTCOMPIND environment variable in a sessions environment.
The OPTCOMPIND value will influence both the access plan and join plan chosen by the optimizer. An OPTCOMPIND setting of 0 or 1 with an active isolation level of Repeatable Read will instruct the optimizer to consider:
- only index scan access paths, when an index is available.
- a sequential scan access path only when no index is available
- nested loop joins
When OPTCOMPIND is 2, or OPTCOMPIND is 1 and the isolation level is not repeatable read, the optimizer chooses the lowest cost path. An OPTCOMPIND value of 2 is the default onconfig.std setting
SET OPTIMIZATION
- Default values are HIGH and ALL_ROWS
- Duration is for the process or until next SET OPTIMIZATION statement
- Set one option per statement, use two statements if needed

The SET OPTIMIZATION SQL statement allows you to specify the optimization goal and time that the optimizer will spend considering alternative query paths.
FIRST_ROWS Versus ALL_ROWS
The goal, in optimizing the query, may be to retrieve the first buffer of rows as quickly as possible, or to retrieve all rows in the quickest manner. If the application is an end-user query tool, we might choose FIRST_ROWS optimization. Perhaps, our end-user is a financial analyst performing a series of what-if scenarios. To run each scenario, he submits a query that retrieves a large number of rows. After viewing just a few rows, he realizes that this scenario will not produce the result he is looking for and moves on to the next query. By choosing FIRST_ROWS optimization, the end user may receive a quicker response from the database server, which in turn, may allow him to be more productive.
For a batch application, however, that must process payroll updates for several thousand employees, ALL_ROWS optimization will probably be the most desirable optimization method and will likely produce the best performance
HIGH vs. LOW
The HIGH and LOW options influence how much time the optimizer will spend analyzing query paths. The HIGH option instructs the optimizer to use a sophisticated algorithm to examine all reasonable query plans and select the best alternative.
The LOW option uses a less sophisticated, but faster algorithm. This algorithm eliminates more of the join options earlier in the optimization phase, reducing the time spent analyzing possible paths.
Optimization LOW
select * from a,b,c,d where a.a = b.a and
b.b = c.b and c.c = d.c

The example above shows how the optimizer might optimize a query when optimization is set to LOW. At each level (two-way join, three-way join, four-way join), the lowest cost join is chosen and the other paths are not examined further.
In the optimization LOW example above, ac is chosen as the least cost two-way join. The other two-way joins are not examined any further. Next, the three-way joins possible from the ac join are examined. Again, only the least cost join is followed down to the next level. As you can see, the number of joins that must be examined is drastically reduced
When To Try OPTIMIZATION LOW
When to try SET OPTIMIZATION LOW:
- Only if the query time is unacceptable
- For queries that join five or more tables
- All join columns are indexed
- One table in the query is joined to all other tables in the query
You should only try SET OPTIMIZATION LOW when your query time is unacceptable. Since there is no way to determine the percentage of query time that was spent on query optimization, you have to make best guess decisions. Some unique types of queries that improve the chance of using the SET OPTIMIZATION LOW algorithm will choose the best query plan:
- When the SELECT statement includes several tables and the number of possible join combination is very high.
- When all join columns are indexed, the optimizer has a better chance of choosing an appropriate path even when using the LOW algorithm.
There is no sure formula for knowing when SET OPTIMIZATION LOW will give you better results, except by testing the query with both settings.
FIRST_ROWS
Useful for end-user query applications:
- SET OPTIMIZATION
FIRST_ROWS
- OPT_GOAL
- ONCONFIG parameter
- Environment variable
The SET OPTIMIZATION FIRST_ROWS statement is very useful for decision support environments and on-line query and reporting activities. It instructs the optimizer to choose a query path that will return the first buffer of rows most quickly, even if the time for retrieving all rows increases.
When you use the SQL statement:
SET OPTIMIZATION FIRST_ROWS;
the optimization goal remains in effect until the end of the process or until you execute a statement to set ALL_ROWS optimization. You can also specify first rows optimization as the default for your instance by setting the OPT_GOAL parameter in your ONCONFIG file; for example:
# Optimization goal: -1 = ALL_ROWS(Default), 0 = FIRST_ROWS
OPT_GOAL -1
Alternatively, you can set an optimization goal for a particular user’s environment using the environment parameter OPT_GOAL.
# ksh
export OPT_GOAL=0
If you only want to use FIRST_ROWS optimization for a single query, you may use the optimizer directive FIRST_ROWS; for example:
SELECT --+ FIRST_ROWS
fname, lname FROM customer
ORDER BY sname
What Makes IDS Directives Better?
- Positive Directives
- Negative Directives

Unlike other database servers, IDS provides both positive and negative directives. A positive directive instructs the optimizer to limit its choice to a certain set of paths. A negative direction instructs the optimizer only to avoid certain less-than optimal paths. The optimizer is still free to consider all other paths. If a new index provides an improved path, the directive does not need to be changed. The optimizer is automatically free to choose the new path.
If the outer circle is the set of all possible query paths, a positive directive would instruct the optimizer to limit considered paths to those in the inner circle. A negative direction would allow the optimizer to choose any path in the set defined by the outer circle, but not the inner circle.
Types of Optimizer Directives
- Access method directives
- Join order directives
- Join method directive
- Optimization goal directives
- Explain directive
Optimizer directives are a new feature of IBM Informix Dynamic Server that allows the query developer to influence the optimizer in the creation of a query plan for an SQL statement.
Optimizer directives provide the flexibility to direct the optimizer to follow specific paths rather than choosing a plan through its analysis. This can result in reduced time for correcting performance problems. Rewriting queries can be very time-consuming, and this provides a quick way to alter a plan and test it.
Directives support control in the following areas of the optimization process:
- Access plans - index vs. scans
- Join plans - forcing hash joins or nested loop joins
- Join order - specify order in which tables are joined
- Optimization goal - for faster response time vs. throughput
EXPLAIN - generates query plan output
Directives
- Can be used:
- In SELECT, UPDATE, and DELETE statements
- In SELECT statements embedded in INSERT statements
- In Views, Stored Procedures and Triggers
- Cannot be used:
- In distributed queries that access remote tables
- For UPDATE/DELETE WHERE CURRENT OF statements
- Multiple directives may be used within the same comment block.
Support for directives is provided for all SELECT, UPDATE, and DELETE statements. They may appear in stored procedures, triggers, and views. They may also appear in SELECT statements embedded in INSERT statements.
Directives will not be valid for distributed queries and for UPDATE/DELETE statements WHERE CURRENT of cursor statements. Directives will be processed by default. A configuration parameter, DIRECTIVES, can be set to 0 (OFF) to disable the processing of directives. The environment variable IFX_DIRECTIVES can also be set to ON or OFF to control whether directives are processed.
Directive Examples
SELECT --+ORDERED AVOID_FULL( a )
*
FROM ps_jrnl_header a, ps_jrnl_ln b
WHERE a.business_unit = "F01"
AND a.business_unit =
b.business_unit
AND a. journal_id = "950930LOAD"
AND a.journal_id = b.journal_id
AND a.journal_date = "1995-09-30"
AND a.journal_date =
b.journal_date
AND a.unpost_seq = 0
AND a.unpost_seq = b.unpost_seq;
Directives are written as a comment whose first character is a + (plus sign). Valid syntax for comments include:
--+directive text
{+directive text}
/*+directive text */
To allow directives to be used in IBM Informix ESQL products, their behavior has been changed to pass comments containing directives to the server instead of stripping them out.
Directives can also be written to tell the optimizer what to AVOID, rather than what to choose. This allows you to write a directive to avoid certain actions that are known to cause performance issues for a query. The optimizer would still be able to explore using any new indexes or table attributes as they are added.
Access Method Directives
- INDEX
- AVOID_INDEX
- FULL
- AVOID_FULL
The four access method directive provided by IDS include:
- INDEX
INDEX supports zero or more indexes to be specified. If one index is specified, that index will be used. If more than one index is specified, the index used will be chosen from that list, based on the cost. If no specific indexes are named, then all indexes will be considered. A sequential table scan is never considered unless the table has no indexes.
Example:
SELECT --+ INDEX (e salary_indx)
name, salary
FROM emp e
WHERE e.dno = 1
AND e.salary > 50000;
- AVOID_INDEX
AVOID_INDEX accepts zero or more indexes which will NOT to be to access the table. This method allows the optimizer to consider indexes that are added after the query is written while avoiding known indexes that will slow down the query.
- FULL (tablename)
FULL forces the optimizer to perform a sequential scan on the table specified, even if an index exists on a column.
Example:
SELECT --+FULL(e)
name, salary
FROM emp e
- AVOID_FULL (tablename)
AVOID_FULL will force the optimizer to avoid a full-table scan of the specified table if indexes exist. If no index exists, the optimizer performs a full-table scan.
This directive might be used with REPEATABLE READ isolation level, for example, to avoid the full table scan and subsequent locking.
Multiple directives can be used as long as they are in the same comment block.
Example:
SELECT --+AVOID_FULL(e),INDEX (e salary_indx)
name, salary
FROM emp e
WHERE e.dno = 1
AND e.salary > 5000
Join Order Directive
Use the ORDERED directive to access tables in FROM clause order.
select --+ORDERED
sum(monetary_amount)
from ps_jrnl_header a, ps_jrnl_ln b
where a.business_unit = b.business_unit
and a.journal_id = b.journal_id
and a.journal_date = b.journal_date
and a.unpost_seq = b.unpost_seq
and a.business_unit = “F01”
and a.journal_date = “1995-09-30”
The ORDERED directive forces the optimizer to join tables in the order in which they appear in the FROM clause.The ORDERED directive is especially beneficial for applications originally written for rules based optimizers. Because a rules based optimizer typically joins tables according to their order in the from clause, developers familiar with rules based optimizers will build SQL statements accordingly.
Join Method Directive
The join method directives allow you to influence whether the optimizer will perform a nested loop or hash join.
- USE_NL
- AVOID_NL
- USE_HASH
- AVOID_HASH
The join method directives allow you to influence when the optimizer will or will not perform nested loop and hash joins. It also allows you to determine which tables will be inner tables in the nested loop joins and which tables will be hashed or probed when hash joins are used.
The join method directives include:
- USE_NL (tablename)
The USE_NL directive may be used to force a nested loop join. In a nested loop join, each row in the outer table is used to probe the inner table to find matching rows. The joined rows are returned as the probe result. Access to the inner table can be a scan, an existing index, or a dynamically built index. USE_NL takes table names as arguments. The maximum number of tables specified is one less than the total number of tables because one table has to be used as the outer table for the sequence of nested loops.
Example:
SELECT --+USE_NL (dept)
name, title, salary, dname
FROM emp, dept, job
WHERE loc = “Palo Alto”
AND emp.dno = dept.dno
AND emp.job = job.job
This will cause the optimizer to use a nested loop join to join the department table with the other tables in the query. The department table will be the inner table of the join.
- AVOID_NL (tablename)
AVOID_NL can be used to force the optimizer to avoid a nested loop join on the specified table(s).
- USE_HASH (tablename)
Use the USE_HASH directive to force a hash join. In a hash join the rows of one of the tables is used to build a hash table for each element. Then, the second table is processed by probing, directly hashing the probed value into the hashed table and checking for a join pair. The USE_HASH directive without any arguments will direct the optimizer to join tables using a hash join; determining the order of that join by cost. It is possible to specify which table is to be the build table or probe table within the directive.
Example:
SELECT --+USE_HASH (dept/BUILD)
name, title, salary, dname
FROM emp, dept, job
WHERE loc = “Palo Alto”
AND emp.dno = dept.dno
AND emp.job = job.jo
This directive causes the optimizer to join the dept table using a hash join and to build the hash table on the dept table.
- AVOID_HASH (tablename)
AVOID_HASH forces the optimizer to avoid hash joins of the listed table. You can optionally restrict the table from being the probe table or the build table
Optimization Goal Directives
FIRST_ROWS
The FIRST_ROWS directive is extremely useful for developers and end-users alike. For developers, it is a simple tool that may be used in conjunction with the SQL FIRST keyword, to quickly access a small sample set of data from a very large table.
Additionally, if developers add the FIRST_ROWS optimization directive to screen populating queries in end-user reporting tools, end-users will perceive a quicker response from the system. Quicker system response means happier users!
EXPLAIN Directive
select --+INDEX( ps_ledger pscledger), EXPLAIN
fiscal_year, business_unit, sum( posted_total_amt)
from ps_ledger
group by fiscal_year, business_unit, posted_total_amt
DIRECTIVES FOLLOWED:
INDEX( ps_ledger pscledger)
EXPLAIN
DIRECTIVES NOT FOLLOWED:
Estimated Cost: 45843
Estimated # of Rows Returned: 11362
Temporary Files Required For: Group By
- informix.ps_ledger: INDEX PATH
- Index Keys: fiscal_year ledger business_unit account
(Serial, fragments: ALL)
The EXPLAIN directive allows you to turn on the SQL explain feature, directly from the query. The EXPLAIN feature is very helpful for testing directives.
Tips for Using Directives
Apply the following guidelines when using directives in your queries:
- Frequently examine the effectiveness of a directive.
- Changes to data values or indexes may alter the best path for a query.
- Use negative directives whenever possible.
- Negative directives are less limiting.
If changes to data values or indexes favor a different path, the query may still be free to choose this path.
These guidelines may be helpful when determining how and where to use directives in your queries:
- Re-evaluate the effectiveness of a directive frequently.
- Changes to the table structure, available indexes, data distribution, etc., may change how the query should be optimized.
- Whenever possible, use negative directives as opposed to positive directives.
While a positive directive will limit the optimizer to one choice, such as always perform an index scan on this table, a negative directive will only exclude a poor choice, leaving all other options available to the optimizer.
|