Oracle® XML DB Developer's Guide 11g Release 2 (11.2) E23094-03 |
|
|
PDF · Mobi · ePub |
This chapter describes full-text search over XML using Oracle. It explains how to use Oracle SQL function contains
and Oracle XPath function ora:contains
. These are the two functions used by Oracle Database to do full-text search over XML data.
See Also:
Oracle Text Reference and Oracle Text Application Developer's Guide for more information about Oracle TextThis chapter contains these topics:
Oracle supports full-text search on documents that are managed by the Oracle Database.
If your documents are XML, then you can use the XML structure of the document to restrict the full-text search. For example, you may want to find all purchase orders that contain the word "electric" using full-text search. If the purchase orders are in XML form, then you can restrict the search by finding all purchase orders that contain the word "electric" in a comment, or by finding all purchase orders that contain the word "electric" in a comment under line items.
If your XML documents are of type XMLType
, then you can project the results of your query using the XML structure of the document. For example, after finding all purchase orders that contain the word "electric" in a comment, you may want to return just the comments, or just the comments that contain the word "electric".
Full-text search differs from structured search or substring search in the following ways:
A full-text search looks for whole words rather than substrings. A substring search for comments that contain the string "law" can return a comment that contains "my lawn is going wild". A full-text search for the word "law" cannot.
A full-text search supports some language-based and word-based searches that substring searches do not. You can use a language-based search, for example, to find all the comments that contain a word with the same linguistic stem as "mouse", and Oracle Text finds "mouse" and "mice". You can use a word-based search, for example, to find all the comments that contain the word "lawn" within 5 words of "wild".
A full-text search generally involves some notion of relevance. When you do a full-text search for all the comments that contain the word "lawn", for example, some results are more relevant than others. Relevance is often related to the number of times the search word (or similar words) occur in the document.
XML search is different from unstructured document search. In unstructured document search you generally search across a set of documents to return the documents that satisfy your text predicate. In XML search you often want to use the structure of the XML document to restrict the search. And you often want to return just the part of the document that satisfies the search.
There are two ways to do a search that includes full-text search and XML structure:
Include the structure inside the full-text predicate, using Oracle SQL function contains
:
WHERE contains(doc, 'electric INPATH (/purchaseOrder/items/item/comment)') > 0
Function contains
is an extension to SQL, and can be used in any query. It requires a CONTEXT
full-text index.
Include the full-text predicate inside the structure, using XPath function ora:contains
:
'/purchaseOrder/items/item/comment[ora:contains(text(), "electric")>0]'
XPath function ora:contains
is an extension to XPath, and can be used in a call to SQL/XML function XMLQuery
, XMLTable
, or XMLExists
.
This section describes details about the examples included in this chapter.
To run the examples, you need database roles CTXAPP
, CONNECT
, and RESOURCE
. You must also have EXECUTE
privilege on the CTXSYS
package CTX_DDL
.
Examples in this chapter are based on "The Purchase Order Schema", W3C XML Schema Part 0: Primer.
The data in the examples is from the document "Purchase-Order XML Document, po001.xml".
The tables used in the examples of this chapter are defined in section "CREATE TABLE Statements""CREATE TABLE Statements". Some of the performance examples are, however, based on a larger table (purchase_orders_xmltype_big
), which is included in the downloadable version only. See http://www.w3.org/TR/xmlschema-0/#po.xml
.
Some of the examples here use data type VARCHAR2
. Others use type XMLType
. All examples that use VARCHAR2
also work with XMLType
.
This section contains these topics:
Oracle SQL function contains
returns a positive number for rows where [schema.]column
matches text_query
. Otherwise, it returns zero. It requires an index of type CONTEXT
. If there is no CONTEXT
index on the column being searched, then contains
raises an error.
contains([schema.]column, text_query VARCHAR2 [,label NUMBER]) RETURN NUMBER
Example 12-1 shows a typical query that uses Oracle SQL function contains
. It returns the id
for each row in table purchase_orders
where the doc
column contains the word "lawn" and id
is less than 25.
Example 12-1 Simple Query using Oracle SQL Function CONTAINS
SELECT id FROM purchase_orders WHERE contains(doc, 'lawn') > 0 AND id < 25;
Suppose doc
is a column that contains a set of XML documents. You can do full-text search over doc
, using its XML structure to restrict the query. The query in Example 12-2 returns id
values for table purchaseorders
where column doc
contains the word "lawn" in the text() node of XML element comment
.
Example 12-2 Restricting a Query using CONTAINS and WITHIN
SELECT id FROM purchase_orders WHERE contains(doc, 'lawn WITHIN comment') > 0;
More complex XML structure restrictions can be applied using the INPATH
operator and an XPath expression. The query in Example 12-3 finds purchase orders that contain the word "electric" in the text() node of a comment
element that is targeted by XPath expression/purchaseOrder/items/item/comment
.
XPath function ora:contains
can be used in an XPath expression inside an XQuery expression or in a call to SQL/XML function XMLQuery
, XMLTable
, or XMLExists
. It is used to restrict a structural search with a full-text predicate. It extends XPath through a standard mechanism: it is a user-defined function in the Oracle XML DB namespace, ora
. It requires no index, but you can use an index with it to improve performance.
ora:contains(input_text NODE*, text_query STRING [,policy_name STRING] [,policy_owner STRING])
Function ora:contains
returns a positive integer when the input_text
matches text_query
(the higher the number, the more relevant the match), and zero otherwise. When used in an XQuery expression, the XQuery return type is xs:integer()
. When used in an XPath expression outside of an XQuery expression, the XPath return type is number
.
Argument input_text
must evaluate to a single text node or an attribute. The syntax and semantics of text_query
in ora:contains
are the same as text_query
in contains
, with the following restrictions:
Argument text_query
cannot include any structure operators (WITHIN
, INPATH
, or HASPATH
).
If the weight
score-weighting operator is used, the weights are ignored.
Example 12-4 shows a call to ora:contains
in the XPath parameter to XMLExists
. Notice the namespace declaration that declares prefix ora
as representing the Oracle XML DB namespace.
Both Oracle SQL function contains
and Oracle XPath function ora:contains
let you combine searching on XML structure with full-text searching. These are the main differences between them:
Oracle SQL function contains
:
Needs a CONTEXT
index to run. If there is no index, then an error is raised.
Does an indexed search and is generally very fast.
Returns a score (through Oracle SQL function score
).
Restricts a search based on documents (rows in a table) rather than nodes.
Cannot be used for XML structure-based projection (extracting parts of an XML document).
Oracle XPath function ora:contains
:
Does not need an index to run, but you can use an index to improve performance.
Might do an unindexed search, so it might be resource-intensive.
Separates application logic from storing and indexing considerations.
Does not return a score.
Can be used for XML structure-based projection (extracting parts of an XML document).
Use Oracle SQL function contains
when you want a fast, index-based, full-text search over XML documents, possibly with simple XML structure constraints. Use Oracle XPath function ora:contains
when you need the flexibility of full-text search combined with XPath navigation (possibly without an index) or when you need to do projection, and you do not need a score.
This section contains these topics:
The second argument to Oracle SQL function contains
, text_query
, is a string that specifies the full-text search. text_query
has its own language, based on the SQL/MM Full-Text standard.
See Also:
ISO/IEC 13249-2:2000, Information technology - Database languages - SQL Multimedia and Application Packages - Part 2: Full-Text, International Organization For Standardization, 2000
Oracle Text Reference for more information about the operators in the text_query
language
The examples in the rest of this section show some of the power of full-text search. They use only a few of the available operators. The example queries search over a VARCHAR2
column (PURCHASE_ORDERS.doc
) with a text index (index type CTXSYS.CONTEXT
).
The text_query
language supports arbitrary combinations of AND
, OR
, and NOT
. Precedence can be controlled using parentheses. The Boolean operators can be written in any of the following ways:
AND
, OR
, NOT
and
, or
, not
&
, |
, ~
Note that NOT
is a binary, not a unary operator here. The expression alpha NOT(beta)
is equivalent to alpha AND
unary-not(beta)
, where unary-not stands for unary negation.
See Also:
Oracle Text Reference for complete information about the operators you can use incontains
and ora:contains
The text_query
language supports stemmed search. Example 12-7 returns all documents that contain some word with the same linguistic stem as "lawns", so it finds "lawn" or "lawns". The stem operator is written as a dollar sign ($
).
You can combine operators in the text_query
language, as shown in Example 12-8.
Oracle SQL function score
is related to Oracle SQL function contains
. Function score
can be used anywhere in a query. It is a measure of relevance, and it is especially useful when doing full-text searches across large document sets. Function score
is typically returned as part of the query result, used in the ORDER BY
clause, or both.
score(label NUMBER) RETURN NUMBER
In Example 12-9, score(10)
returns the score for each row in the result set. Oracle SQL function score
returns the relevance of a row in the result set with respect to a particular call to function contains
. A call to score
is linked to a call to contains
by a LABEL
(in this case the number 10).
Example 12-9 Simple CONTAINS Query with SCORE
SELECT score(10), id FROM purchase_orders WHERE contains(doc, 'lawn', 10) > 0 AND score(10) > 2 ORDER BY score(10) DESC;
Function score
always returns 0
if, for the corresponding contains
expression, argument text_query
does not match input_text
, according to the matching rules dictated by the text index. If the contains
text_query
matches the input_text
, then score
returns a number greater than 0
and less than or equal to 100
. This number indicates the relevance of the text_query
to the input_text
. A higher number means a better match.
If the contains
text_query
consists of only the HASPATH
operator and a Text Path, the score is either 0
or 100
, because HASPATH
tests for an exact match.
See Also:
Oracle Text Reference for details on how the score is calculatedOracle SQL function contains
does a full-text search across the whole document, by default. In the example heres, a search for "lawn" with no structure restriction finds all purchase orders with the word "lawn" anywhere in them.
There are three ways to restrict contains
queries using XML structure:
WITHIN
INPATH
HASPATH
Note:
For the purposes of this discussion, consider section to be the same as an XML node.The WITHIN
operator restricts a query to some section within an XML document. A search for purchase orders that contain the word "lawn" somewhere inside a comment section might use WITHIN
. Section names are case-sensitive.
You can restrict the query further by nesting WITHIN
. Example 12-11 finds all documents that contain the word "lawn
" within a section "comment
", where that occurrence of "lawn
" is also within a section "item
".
SELECT id FROM purchase_orders WHERE contains(doc, '(lawn WITHIN comment) WITHIN item') > 0;
Example 12-11 returns no rows. Our sample purchase order does contain the word "lawn
" within a comment. But the only comment within an item is "Confirm this is electric
". So the nested WITHIN
query returns no rows.
You can also search within attributes. Example 12-12 finds all purchase orders that contain the word 10
in the orderDate
attribute of a purchaseOrder
element.
Example 12-12 WITHIN an Attribute
SELECT id FROM purchase_orders WHERE contains(doc, '10 WITHIN purchaseOrder@orderDate') > 0;
By default, the minus sign ("-
") is treated as a word separator: "1999-10-20
" is treated as the three words "1999
", "10
" and "20
". So this query returns one row.
Text in an attribute is not a part of the main searchable document. A search for 10
without qualifying the text_query
with WITHIN purchaseOrder@orderDate
returns no rows.
You cannot search attributes in a nested WITHIN
.
Suppose you want to find purchase orders that contain two words within a comment section: "lawn" and "electric". There can be more than one comment section in a purchaseOrder
. So there are two ways to write this query, with two distinct results.
If you want to find purchase orders that contain both words, where each word occurs in some comment section, you would write a query like Example 12-13.
Example 12-13 WITHIN and AND: Two Words in Some Comment Section
SELECT id FROM purchase_orders WHERE contains(doc, '(lawn WITHIN comment) AND (electric WITHIN comment)') > 0;
If you run this query against the purchaseOrder
data, then it returns 1 row. Note that the parentheses are not needed in this example, but they make the query more readable.
If you want to find purchase orders that contain both words, where both words occur in the same comment, you would write a query like Example 12-14.
Example 12-14 WITHIN and AND: Two Words in the Same Comment
SELECT id FROM purchase_orders WHERE contains(doc, '(lawn AND electric) WITHIN comment') > 0;
The query in Example 12-14 returns no rows. The query in Example 12-15, which omits the parentheses around lawn AND electric
, returns one row.
Example 12-15 WITHIN and AND: No Parentheses
SELECT id FROM purchase_orders WHERE contains(doc, 'lawn AND electric WITHIN comment') > 0;
Operator WITHIN
has a higher precedence than AND
, so Example 12-15 is parsed as Example 12-16.
The preceding examples have used the WITHIN
operator to search within a section. A section can be a:
path or zone section
This is a concatenation, in document order, of all text nodes that are descendants of a node, with whitespace separating the text nodes. To convert from a node to a zone section, you must serialize the node and replace all tags with whitespace. path sections have the same scope and behavior as zone sections, except that path sections support queries with INPATH
and HASPATH
structure operators.
field section
This is the same as a zone section, except that repeating nodes in a document are concatenated into a single section, with whitespace as a separator.
attribute section
special section (sentence or paragraph)
See Also:
Oracle Text Reference for more information about special sectionsOperator WITHIN
provides an easy and intuitive way to express simple structure restrictions in the text_query
. For queries that use abundant XML structure, you can use operator INPATH
plus a text path instead of nested WITHIN
operators.
Operator INPATH
takes a text_query
on the left and a Text Path, enclosed in parentheses, on the right. Example 12-17 finds purchaseOrders
that contain the word "electric
" in the path /purchaseOrder/items/item/comment
.
Example 12-17 Structure Inside Full-Text Predicate: INPATH
SELECT id FROM purchase_orders WHERE contains(doc, 'electric INPATH (/purchaseOrder/items/item/comment)') > 0;
The scope of the search in Example 12-17 is the section indicated by the Text Path. The query in Example 12-18 uses a broader path than the query in Example 12-17, but it too returns one row.
Example 12-18 Structure Inside Full-Text Predicate: INPATH
SELECT id FROM purchase_orders WHERE contains(doc, 'electric INPATH (/purchaseOrder/items)') > 0;
The syntax and semantics of Text Path are based on the w3c XPath 1.0 recommendation. Simple path expressions are supported (abbreviated syntax only), but functions are not. The following examples are meant to give the general flavor.
See Also:
http://www.w3.org/TR/xpath
for information about the W3C XPath 1.0 recommendation
"Text Path BNF Specification" for the Text Path grammar
Example 12-19 finds all purchase orders that contain the word "electric
" in a comment
element that is the child of an item
element with a partNum
attribute whose value is "872-AA
", which in turn is the child of an items
element that is any number of levels under the root node.
Example 12-19 INPATH with Complex Path Expression (1)
SELECT id FROM purchase_orders WHERE contains(doc, 'electric INPATH (//items/item[@partNum="872-AA"]/comment)') > 0;
Example 12-20 finds all purchase orders that contain the word "lawnmower
" in a third-level item
element (or any of its descendants) that has a comment
element descendant at any level. This query returns one row. The scope of the query is not a comment
element, but the set of items
elements that each have a comment
element as a descendant.
The Text Path language differs from the XPath language in the following ways:
Not all XPath operators are included in the Text Path language.
XPath built-in functions are not included in the Text Path language.
Text Path language operators are case-insensitive.
If you use =
inside a filter (brackets), then matching follows text-matching rules.
Rules for case-sensitivity, normalization, stopwords and whitespace depend on the text index definition. To emphasize this difference, this kind of equality is referred to here as text-equals.
Namespace support is not included in the Text Path language.
The name of an element, including a namespace prefix if it exists, is treated as a string. So two different namespace prefixes that map to the same namespace URI are not treated as equivalent in the Text Path language.
In a Text Path, the context is always the root node of the document.
So in the purchase-order data, purchaseOrder/items/item
, /purchaseOrder/items/item
, and ./purchaseOrder/items/item
are all equivalent.
If you want to search within an attribute value, then the direct parent of the attribute must be specified (wildcards cannot be used).
A Text Path may not end in a wildcard (*
).
See Also:
"Text Path BNF Specification" for the Text Path grammarYou can nest INPATH
expressions. The context for the Text Path is always the root node. It is not changed by a nested INPATH
.
Example 12-21 finds purchase orders that contain the word "electric
" inside a comment
element at any level, where the occurrence of that word is also in an items
element that is a child of the top-level purchaseOrder
element.
SELECT id FROM purchase_orders WHERE contains(doc, '(electric INPATH (//comment)) INPATH (/purchaseOrder/items)') > 0;
This nested INPATH
query could be written more concisely as shown in Example 12-22.
Operator HASPATH
takes only one operand: a Text Path, enclosed in parentheses, on the right. Use HASPATH
when you want to find documents that contain a particular section in a particular path, possibly with predicate =
. This is a path search rather than a full-text search. You can check for the existence of a section, or you can match the contents of a section, but you cannot do word searches. If your data is of type XMLType
, then consider using SQL/XML function XMLExists
instead of structure operator HASPATH
.
Example 12-23 finds purchaseOrders
that have some item that has a USPrice
.
SELECT id FROM purchase_orders WHERE contains(DOC, 'HASPATH (/purchaseOrder//item/USPrice)') > 0;
Example 12-24 finds purchaseOrders
that have some item that has a USPrice
that text-equals "148.95
".
See Also:
"Text Path Compared to XPath" for an explanation of text-equalsExample 12-24 HASPATH Equality
SELECT id FROM purchase_orders WHERE contains(doc, 'HASPATH (/purchaseOrder//item/USPrice="148.95")') > 0;
HASPATH
can be combined with other contains
operators such as INPATH
. Example 12-25 finds purchaseOrders
that contain the word electric
anywhere in the document and have some item
that has a USPrice
that text-equals 148.95
and contain 10
in the purchaseOrder
attribute orderDate
.
The result of a SQL query with a contains
expression in the WHERE
clause is always a set of rows (and possibly score
information), or a projection over the rows that match the query.
If you want to return only a part of each XML document that satisfies a contains
expression, then use SQL/XML function XMLQuery
. The examples in this section use the XMLType
table purchase_orders_xmltype
.
Example 12-26 finds purchaseOrder
s that contain the word "electric
" inside a comment
element that is a descendant of the top-level element purchaseOrder
. Instead of returning the ID of the row for each result, XMLQuery
is used to return only the comment
element.
Example 12-26 Scoping the Results of a CONTAINS Query
SELECT XMLQuery('declare namespace ora = "http://xmlns.oracle.com/xdb"; (: :) $d/purchaseOrder//comment' PASSING doc AS "d" RETURNING CONTENT) "Item Comment" FROM purchase_orders_xmltype WHERE CONTAINS(doc, 'electric INPATH (/purchaseOrder//comment)') > 0;
The result of Example 12-26 is two instances of element comment
. Function contains
indicates which rows contain the word "electric
" inside a comment
element (the row with ID
= 1
), and function XMLQuery
extracts all of the instances of element comment
in the document at that row. There are two instances of element comment
inside the purchaseOrder
element, and the query returns both of them.
This might not be what you want. If you want the query to return only the instances of element comment
that satisfy the contains
expression, then you must repeat that predicate in the XQuery expression passed to XMLQuery
. You do that using XPath function ora:contains
. Example 12-27 illustrates this.
Example 12-27 Projecting the Result of a CONTAINS Query using ora:contains
SELECT XMLQuery('declare namespace ora = "http://xmlns.oracle.com/xdb"; (: :) $d/purchaseOrder/items/item/comment [ora:contains(text(), "electric") > 0]' PASSING doc AS "d" RETURNING CONTENT) "Item Comment" FROM purchase_orders_xmltype WHERE CONTAINS(doc, 'electric INPATH (/purchaseOrder/items/item/comment)') > 0;
This section contains these topics:
The general-purpose full-text index type is CONTEXT
, which is owned by database user CTXSYS
. To create a default full-text index, use the regular SQL CREATE INDEX
command, and add the clause INDEXTYPE IS CTXSYS.CONTEXT
, as shown in Example 12-28.
Example 12-28 Simple CONTEXT Index on Table PURCHASE_ORDERS
CREATE INDEX po_index ON purchase_orders(doc) INDEXTYPE IS CTXSYS.CONTEXT;
You have many choices available when building a full-text index. These choices are expressed as indexing preferences. To use an indexing preference, add the PARAMETERS
clause to CREATE INDEX
, as shown in Example 12-29.
See Also:
"CONTEXT Index Preferences"Example 12-29 Simple CONTEXT Index on XMLType Table with Path Section Group
CREATE INDEX po_index ON purchase_orders(doc) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('section group CTXSYS.PATH_SECTION_GROUP');
Oracle Text provides other index types, such as CTXCAT
and CTXRULE
, which are outside the scope of this chapter.
You can build a CONTEXT
index on any data that contains text. Example 12-28 creates a CONTEXT
index on a VARCHAR2
column. The syntax to create a CONTEXT
index on a column of type CHAR
, VARCHAR
, VARCHAR2
, BLOB
, CLOB
, BFILE
, XMLType
, or URIType
is the same. Example 12-30 creates a CONTEXT
index on a column of type XMLType
. The section group defaults to PATH_SECTION_GROUP
.
Example 12-30 Simple CONTEXT Index on XMLType Column
CREATE INDEX po_index_xmltype ON purchase_orders_xmltype(doc) INDEXTYPE IS CTXSYS.CONTEXT;
If you have an XMLType
table, then you must use object syntax to create the CONTEXT
index, as shown in Example 12-31.
Example 12-31 Simple CONTEXT Index on XMLType Table
CREATE INDEX po_index_xmltype_table ON purchase_orders_xmltype_table (OBJECT_VALUE) INDEXTYPE IS CTXSYS.CONTEXT;
You can query the table as shown in Example 12-32.
Example 12-32 CONTAINS Query on XMLType Table
SELECT XMLCast(XMLQuery( 'declare namespace ora = "http://xmlns.oracle.com/xdb"; (: :) $p/purchaseOrder/@orderDate' PASSING po.OBJECT_VALUE AS "p" RETURNING CONTENT) AS DATE) "Order Date" FROM purchase_orders_xmltype_table po WHERE contains(po.OBJECT_VALUE, 'electric INPATH (/purchaseOrder//comment)') > 0;
The CONTEXT
index, like most full-text indexes, is asynchronous. When indexed data is changed, the CONTEXT
index might not change until you take some action, such as calling a procedure to synchronize the index.
The CONTEXT
index can become fragmented over time. A fragmented index uses more space and leads to slower queries. There are a number of ways to optimize (defragment) the CONTEXT
index.
To use Oracle SQL function contains
, you must create an index of type CONTEXT
. If you call contains
, and the column given in the first argument does not have an index of type CONTEXT
, then an error is raised.
The syntax and semantics of text_query
depend on the choices you make when you build the CONTEXT
index. For example:
What counts as a word?
Are very common words processed?
What is a common word?
Is the text search case-sensitive?
Can the text search include themes (concepts) in addition to keywords?
A preference can be considered a collection of indexing choices. Preferences include section group, datastore, filter, wordlist, stoplist and storage. This section shows how to set up a lexer preference to make searches case-sensitive.
You can use procedure CTX_DDL.create_preference
(or CTX_DDL.create_stoplist
) to create a preference. Override default choices in that preference group by setting attributes of the new preference, using procedure CTX_DDL.set_attribute
. Then use the preference in a CONTEXT
index by including preference type preference_name
in the PARAMETERS
string of CREATE INDEX
.
Once a preference has been created, you can use it to build any number of indexes.
Full-text searches with contains
are case-insensitive by default. That is, when matching words in text_query
against words in the document, case is not considered. Section names and attribute names, however, are always case-sensitive.
If you want full-text searches to be case-sensitive, then you need to make that choice when building the CONTEXT
index. Example 12-33 returns 1 row, because "HURRY
" in text_query
matches "Hurry
" in the purchaseOrder
with the default case-insensitive index.
Example 12-33 CONTAINS: Default Case Matching
SELECT id FROM purchase_orders WHERE contains(doc, 'HURRY INPATH (/purchaseOrder/comment)') > 0;
Example 12-34 creates a new lexer preference my_lexer
, with the attribute mixed_case
set to TRUE
. It also sets printjoin characters to "-
" and "!
" and ",
". You can use the same preferences for building CONTEXT
indexes and for building policies.
See Also:
Oracle Text Reference for a full list of lexer attributesExample 12-34 Create a Preference for Mixed Case
BEGIN CTX_DDL.create_preference(PREFERENCE_NAME => 'my_lexer', OBJECT_NAME => 'BASIC_LEXER'); CTX_DDL.set_attribute(PREFERENCE_NAME => 'my_lexer', ATTRIBUTE_NAME => 'mixed_case', ATTRIBUTE_VALUE => 'TRUE'); CTX_DDL.set_attribute(PREFERENCE_NAME => 'my_lexer', ATTRIBUTE_NAME => 'printjoins', ATTRIBUTE_VALUE => '-,!'); END; /
Example 12-35 builds a CONTEXT
index using the new my_lexer
lexer preference. It uses preference preference-case-mixed
.
Example 12-35 CONTEXT Index on PURCHASE_ORDERS Table, Mixed Case
CREATE INDEX po_index ON purchase_orders(doc) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS('lexer my_lexer section group CTXSYS.PATH_SECTION_GROUP');
Example 12-33 returns no rows, because "HURRY
" in text_query
no longer matches "Hurry
" in the purchaseOrder
. Example 12-36 returns one row, because the text_query
term "Hurry
" exactly matches the word "Hurry
" in the purchaseOrder
.
One of the choices you make when creating a CONTEXT
index is section group. A section group instance is based on a section group type. The section group type specifies the kind of structure in your documents, and how to index (and therefore search) that structure. The section group instance may specify which structure elements are indexed. Most users either take the default section group or use a predefined section group.
The section group types useful in XML searching are:
PATH_SECTION_GROUP
Choose this when you want to use WITHIN
, INPATH
and HASPATH
in queries, and you want to be able to consider all sections to scope the query.
XML_SECTION_GROUP
Choose this when you want to use WITHIN
, but not INPATH
and HASPATH
, in queries, and you want to be able to consider only explicitly-defined sections to scope the query. XML_SECTION_GROUP
section group type supports FIELD
sections in addition to ZONE
sections. In some cases FIELD
sections offer significantly better query performance.
AUTO_SECTION_GROUP
Choose this when you want to use WITHIN
, but not INPATH
and HASPATH
, in queries, and you want to be able to consider most sections to scope the query. By default all sections are indexed (available for query restriction). You can specify that some sections are not indexed (by defining STOP
sections).
NULL_SECTION_GROUP
Choose this when defining no XML sections.
Other section group types include:
BASIC_SECTION_GROUP
HTML_SECTION_GROUP
NEWS_SECTION_GROUP
Oracle recommends that most users with XML full-text search requirements use PATH_SECTION_GROUP
. Some users might prefer XML_SECTION_GROUP
with FIELD
sections. This choice generally gives better query performance and a smaller index, but it is limited to documents with fielded structure (searchable nodes are all leaf nodes that do not repeat).
See Also:
Oracle Text Reference for a detailed description of theXML_SECTION_GROUP
section group typeWhen choosing a section group to use with your index, you can choose a supplied section group, take the default, or create a new section group based on the section group type you have chosen.
There are supplied section groups for section group types PATH_SECTION_GROUP
, AUTO_SECTION_GROUP
, and NULL_SECTION_GROUP
. The supplied section groups are owned by CTXSYS
and have the same name as their section group types. For example, the supplied section group of section group type PATH_SECTION_GROUP
is CTXSYS.PATH_SECTION_GROUP
.
There is no supplied section group for section group type XML_SECTION_GROUP
, because a default XML_SECTION_GROUP
would be empty and therefore meaningless. If you want to use section group type XML_SECTION_GROUP
, then you must create a new section group and specify each node that you want to include as a section.
When you create a CONTEXT
index on data of type XMLType
, the default section group is the supplied section group CTXSYS.PATH_SECTION_GROUP
. If the data is VARCHAR
or CLOB
, then the default section group is CTXSYS.NULL_SECTION_GROUP
.
See Also:
Oracle Text Reference for instructions on creating your own section groupTo associate a section group with an index, add section group <section group name>
to the PARAMETERS
string, as in Example 12-37.
Function ora:contains
is an Oracle-defined XQuery (XPath) function for use in the XQuery expression argument to SQL/XML functions XMLQuery
, XMLTable
, and XMLExists
.
When you use ora:contains
you must also supply a namespace declaration that maps prefix ora
to the Oracle XML DB namespace, xmlns:ora="http://xmlns.oracle.com/xdb"
.
Function ora:contains
returns a number. It does not return a score. It returns a positive number if the text_query
matches the input_text
. Otherwise it returns zero.
The ora:contains
argument text_query
is a string that specifies the full-text search. The ora:contains
text_query
is the same as the contains
text_query
, with the following restrictions:
ora:contains
text_query
must not include any of the structure operators WITHIN
, INPATH
, or HASPATH
ora:contains
text_query
can include the score weighting operator weight(*)
, but weights are ignored
If you include any of the following in the ora:contains
text_query
, the query cannot use a CONTEXT
index:
Score-based operator MINUS
(-
) or threshold
(>
)
Selective, corpus-based expansion operator FUZZY
(?
) or soundex
(!
)
See Also:
"XPath Rewrite and CONTEXT Indexes"Example 12-4 shows a full-text search using an arbitrary combination of Boolean operators and $
(stemming).
See Also:
"Full-Text Search using SQL Function CONTAINS" for a description of full-text operators
Oracle Text Reference for a full list of the operators you can use in contains
and ora:contains
Matching rules are defined by the policy, policy_owner
.
policy_name
. If policy_owner
is absent, then the policy owner defaults to the current user. If both policy_name
and policy_owner
are absent, then the policy defaults to CTXSYS.DEFAULT_POLICY_ORACONTAINS
.
When you use ora:contains
in an XPath expression, the scope is defined by argument input_text
. This argument is evaluated in the current XPath context. If the result is a single text node or an attribute, then that node is the target of the ora:contains
search. If input_text
does not evaluate to a single text node or an attribute, an error is raised.
The policy determines the matching rules for ora:contains
. The section group associated with the default policy for ora:contains
is of type NULL_SECTION_GROUP
.
ora:contains
can be used anywhere in an XPath expression, and its input_text
argument can be any XPath expression that evaluates to a single text node or an attribute.
If you want to return only a part of each XML document, then use function XMLQuery
to project a node sequence, possibly applying XMLCast
to the result to project the scalar value of a node.
Example 12-38 returns the orderDate
for each purchase order that has a comment
that contains the word "lawn
".
Example 12-38 Using ora:contains with XMLQuery and XMLExists
SELECT XMLCast(XMLQuery( 'declare namespace ora = "http://xmlns.oracle.com/xdb"; (: :) $d/purchaseOrder/@orderDate' PASSING doc AS "d" RETURNING CONTENT) AS DATE) "Order date" FROM purchase_orders_xmltype WHERE XMLExists( 'declare namespace ora = "http://xmlns.oracle.com/xdb"; (: :) $d/purchaseOrder/comment [ora:contains(text(), "($lawns AND wild) OR flamingo") > 0]' PASSING doc AS "d");
Function XMLExists
restricts the result to rows (documents) where the purchaseOrder
element includes some comment
that contains the word "lawn
". Function XMLQuery
then returns the value of attribute orderDate
from those purchaseOrder
elements. Function XMLCast
casts this result as a SQL DATE
value.
If //comment
had been extracted, then both comments from the sample document would have been returned, not just the comment that matches the WHERE
clause.
The CONTEXT
index on a column determines the semantics of contains
queries on that column. Because ora:contains
does not rely on a supporting index, some other means must be found to provide many of the same choices when doing ora:contains
queries. A policy is a collection of preferences that can be associated with an ora:contains
query to give the same sort of semantic control as the indexing choices give to the contains
user.
When using Oracle SQL function contains
, indexing preferences affect the semantics of the query. You create a preference using procedure CTX_DDL.create_preference
(or CTX_DDL.create_stoplist
). You override default choices by setting attributes of the new preference, using procedure CTX_DDL.set_attribute
. Then you use the preference in a CONTEXT
index by including preference_type preference_name
in the PARAMETERS
string of CREATE INDEX
.
See Also:
"CONTEXT Index Preferences"Because ora:contains
does not have a supporting index, a different mechanism is needed to apply preferences to a query. That mechanism is a policy, consisting of a collection of preferences, and it is used as a parameter to ora:contains
.
Example 12-39 creates a policy with an empty stopwords list.
Example 12-39 Create a Policy to Use with ora:contains
BEGIN
CTX_DDL.create_policy(POLICY_NAME => 'my_nostopwords_policy',
STOPLIST => 'CTXSYS.EMPTY_STOPLIST');
END;
/
For simplicity, this policy consists of an empty stoplist, which is owned by user CTXSYS
. You could create a new stoplist to include in this policy, or you could reuse a stoplist (or lexer) definition that you created for a CONTEXT
index.
Refer to this policy in an ora:contains
expression to search for all words, including the most common ones (stopwords). Example 12-40 returns zero comments, because "is
" is a stopword by default and cannot be queried.
Example 12-40 Finding a Stopword using ora:contains
SELECT id FROM purchase_orders_xmltype
WHERE XMLExists(
'declare namespace ora = "http://xmlns.oracle.com/xdb"; (: :)
$d/purchaseOrder/comment[ora:contains(text(), "is") > 0]'
PASSING doc AS "d");
Example 12-41 uses the policy created in Example 12-39 to specify an empty stopword list. This query finds "is
" and returns 1 comment.
Example 12-41 Finding a Stopword using ora:contains and Policy my_nostopwords_policy
SELECT id FROM purchase_orders_xmltype
WHERE XMLExists(
'declare namespace ora = "http://xmlns.oracle.com/xdb"; (: :)
$d/purchaseOrder/comment
[ora:contains(text(), "is", "MY_NOSTOPWORDS_POLICY") > 0]'
PASSING doc AS "d");
Example 12-41 uses policy my_nostopwords_policy
. This policy was implicitly named as all uppercase, in Example 12-39. Because XPath is case-sensitive, it must be referred to in the XPath predicate using all uppercase: MY_NOSTOPWORDS_POLICY
, not my_nostopwords_policy
.
The ora:contains
policy affects the matching semantics of text_query
. The ora:contains
policy may include a lexer, stoplist, wordlist preference, or any combination of these. Other preferences that can be used to build a CONTEXT
index are not applicable to ora:contains
. The effects of the preferences are as follows:
The wordlist preference tweaks the semantics of the stem operator.
The stoplist preference defines which words are too common to be indexed (searchable).
The lexer preference defines how words are tokenized and matched. For example, it defines which characters count as part of a word and whether matching is case-sensitive.
See Also:
"Policy Example: Supplied Stoplist" for an example of building a policy with a predefined stoplist
"Policy Example: User-Defined Lexer" for an example of a case-sensitive policy
When you search for a document that contains a particular word, you usually want the search to be case-insensitive. If you do a search that is case-sensitive, then you often miss some expected results. For example, if you search for purchaseOrders
that contain the phrase "baby monitor", then you would not expect to miss our example document just because the phrase is written "Baby Monitor".
Full-text searches with ora:contains
are case-insensitive by default. Section names and attribute names, however, are always case-sensitive.
If you want full-text searches to be case-sensitive, then you need to make that choice when you create a policy. You can use this procedure:
Create a preference using the procedure CTX_DDL.create_preference
(or CTX_DDL.create_stoplist
).
Override default choices in that preference object by setting attributes of the new preference, using procedure CTX_DDL.set_attribute
.
Use the preference as a parameter to CTX_DDL.create_policy
.
Use the policy name as the third argument to ora:contains
in a query.
Once you have created a preference, you can reuse it in other policies or in CONTEXT
index definitions. You can use any policy with any ora:contains
query.
Example 12-42 returns 1 row, because "HURRY
" in text_query
matches "Hurry
" in the purchaseOrder
with the default case-insensitive index.
Example 12-42 ora:contains, Default Case-Sensitivity
SELECT id FROM purchase_orders_xmltype WHERE XMLExists( 'declare namespace ora = "http://xmlns.oracle.com/xdb"; (: :) $d/purchaseOrder/comment[ora:contains(text(), "HURRY") > 0]' PASSING doc AS "d");
Example 12-43 creates a new lexer preference my_lexer
, with the attribute mixed_case
set to TRUE
. It also sets printjoin
characters to "-
", "!
" and ",
". You can use the same preferences for building CONTEXT
indexes and for building policies.
See Also:
Oracle Text Reference for a full list of lexer attributesExample 12-43 Create a Preference for Mixed Case
BEGIN CTX_DDL.create_preference(PREFERENCE_NAME => 'my_lexer', OBJECT_NAME => 'BASIC_LEXER'); CTX_DDL.set_attribute(PREFERENCE_NAME => 'MY_LEXER', ATTRIBUTE_NAME => 'MIXED_CASE', ATTRIBUTE_VALUE => 'TRUE'); CTX_DDL.set_attribute(PREFERENCE_NAME => 'my_lexer', ATTRIBUTE_NAME => 'printjoins', ATTRIBUTE_VALUE => '-,!'); END; /
Example 12-44 creates a new policy my_policy
and specifies only the lexer. All other preferences are defaulted. Example 12-44 uses preference-case-mixed
.
Example 12-44 Create a Policy with Mixed Case (Case-Insensitive)
BEGIN CTX_DDL.create_policy(POLICY_NAME => 'my_policy', LEXER => 'my_lexer'); END; /
Example 12-45 uses the new policy in a query. It returns no rows, because "HURRY
" in text_query
no longer matches "Hurry
" in the purchaseOrder
.
Example 12-45 ora:contains, Case-Sensitive (1)
SELECT id FROM purchase_orders_xmltype WHERE XMLExists( 'declare namespace ora = "http://xmlns.oracle.com/xdb"; (: :) $d/purchaseOrder/comment [ora:contains(text(), "HURRY", "my_policy") > 0]' PASSING doc AS "d");
Example 12-46 returns one row, because the text_query
term "Hurry
" exactly matches the text "Hurry
" in the comment
element.
The policy argument to ora:contains
is optional. If it is omitted, then the query uses the default policy CTXSYS.DEFAULT_POLICY_ORACONTAINS
.
When you create a policy for use with ora:contains
, you do not need to specify every preference. In Example 12-44, for example, only the lexer preference was specified. For the preferences that are not specified, CREATE_POLICY
uses the default preferences:
CTXSYS.DEFAULT_LEXER
CTXSYS.DEFAULT_STOPLIST
CTXSYS.DEFAULT_ WORDLIST
Creating a policy follows copy semantics for preferences and their attributes, just as creating a CONTEXT
index follows copy semantics for index metadata.
The ora:contains
XPath function does not depend on a supporting index. ora:contains
is very flexible. But if you use it to search across large amounts of data without an index, then it can also be resource-intensive. This section shows how to get the best performance from queries that include XPath expressions with XPath function ora:contains
.
Note:
Function-based indexes can also be very effective in speeding up XML queries, but they are not generally applicable to Text queries.The examples in this section use table purchase_orders_xmltype_big
. This has the same table structure and XML schema as purchase_orders_xmltype
, but it has around 1,000 rows. Each row has a unique ID (in column id
), and some different text in /purchaseOrder/items/item/comment
. Where an execution plan is shown, it was produced using the SQL*Plus command AUTOTRACE
. Execution plans can also be produced using SQL commands TRACE
and TKPROF
. A description of commands AUTOTRACE
, trace
and tkprof
is outside the scope of this chapter.
This section contains these topics:
Because ora:contains
is relatively costly to process, Oracle recommends that you write queries that include a primary filter wherever possible. This minimizes the number of rows processed by ora:contains
.
Example 12-47 examines each row in a table (a full table scan), as shown by the execution plan. In this example, ora:contains
is evaluated for each row.
Example 12-47 ora:contains in Large Table
SELECT id FROM purchase_orders_xmltype_big WHERE XMLExists( 'declare namespace ora = "http://xmlns.oracle.com/xdb"; (: :) $d/purchaseOrder/comment[ora:contains(text(), "constitution") > 0]' PASSING doc AS "d");
Execution Plan ------------------------------------------------------------------------------------------------ | Id | Operation | Name |Rows|Bytes|Cost(%CPU)| Time| ------------------------------------------------------------------------------------------------ | 0| SELECT STATEMENT | | 32|64480|686(38)|00:00:09| |* 1| FILTER | | | | | | | 2|TABLE ACCESS FULL |PURCHASE_ORDERS_XMLTYPE_BIG|1161|2284K| 140(3)|00:00:02| |* 3|COLLECTION ITERATOR PICKLER FETCH| XMLSEQUENCEFROMXMLTYPE | | | | | ------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter( EXISTS (SELECT 0 FROM TABLE() "KOKBF$" WHERE SYS_XMLCONTAINS(SYS_XQ_UPKXML2SQL(SYS_XQEXVAL(SYS_XQEXTRACT(SYS_XQCON2SEQ(VALUE(KOKBF$)), '/comment/text()'),1,50),50,1,0),'constitution')>0)) 3 - filter(SYS_XMLCONTAINS(SYS_XQ_UPKXML2SQL(SYS_XQEXVAL(SYS_XQEXTRACT(SYS_XQCON2SEQ(VALUE(KOKBF$)), '/comment/text()'),1,50),50,1,0),'constitution')>0) Note ----- - dynamic sampling used for this statement
If you create an index on column id
, as shown in Example 12-48, and you add a selective predicate id
to the query, as shown in Example 12-49, then index id
drives the execution, as shown in the execution plan. Function ora:contains
is then executed only for the rows where id
is less than 5.
Example 12-49 ora:contains in Large Table, with Additional Predicate
SELECT id FROM purchase_orders_xmltype_big
WHERE XMLExists(
'declare namespace ora = "http://xmlns.oracle.com/xdb"; (: :)
$d/purchaseOrder/comment[ora:contains(text(), "constitution") > 0]'
PASSING doc AS "d")
AND id > 5;
Execution Plan ----------------------------------------------------------------------------------------------- | Id | Operation | Name |Rows| Bytes |Cost(%CPU)| Time| ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 2015 |8 (13)|00:00:01| |* 1 | TABLE ACCESS BY INDEX ROWID|PURCHASE_ORDERS_XMLTYPE_BIG| 1 | 2015 |8 (13)|00:00:01| |* 2 | INDEX RANGE SCAN |ID_INDEX | 10 | |2 (0)|00:00:01| ----------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(EXISTSNODE(SYS_MAKEXML("PURCHASE_ORDERS_XMLTYPE_BIG"."SYS_NC00003$ "),'/purchaseOrder/items/item/comment[ora:contains(text(), "constitution") > 0]','xmlns:ora="http://xmlns.oracle.com/xdb"')=1) 2 - access("ID">5) Note ----- - dynamic sampling used for this statement
Oracle Database can sometimes optimize a query that makes use of an XPath expression. This XPath rewriting is done automatically as part of query optimization.
Although Oracle XQuery function ora:contains
does not rely on a supporting index, when XPath rewrite occurs ora:contains
can often make use of an existing CONTEXT
index for better performance.
See Also:
"Automatic Rewriting of XQuery and XPath Expressions" for more on the benefits of XPath rewrite
Chapter 8, "XPath Rewrite for Structured Storage" for a full discussion of XPath rewrite for object-relational storage
Example 12-50 ora:contains Search for "electric"
SELECT id FROM purchase_orders_xmltype WHERE XMLExists( 'declare namespace ora = "http://xmlns.oracle.com/xdb"; (: :) $d/purchaseOrder/items/item/comment [ora:contains(text(), "electric") > 0]' PASSING doc AS "d");
A naive evaluation of the XPath expression in Example 12-50 would test each cell in column doc
to see if it matches that expression.
But if doc
is XML schema-based, and the purchaseOrder
documents are physically stored in object-relational tables, then it makes sense to go straight to column comment
(if such a column exists) and test each cell there to see if it matches "electric
".
If the first argument to ora:contains
maps to a single relational column, then ora:contains
can be applied to that column, instead of applying the complete XPath expression to the entire XML document. Even if there are no indexes involved, this can significantly improve query performance.
If you are using ora:contains
with a text node or an attribute that maps to a column that has a CONTEXT
index then that index can sometimes be applied to the data in the underlying column. The following conditions must both be true, in order for a CONTEXT
index to be used with object-relational XMLType
data.
The ora:contains
target (input_text
) must be either a single text node whose parent node maps to a column or an attribute that maps to a column. The column must be a single relational column (possibly in an ordered collection table).
As noted in "Policies for ora:contains Queries", the indexing choices (for contains
) and policy choices (for ora:contains
) affect the semantics of queries. A simple mismatch might be that the index-based contains
would do a case-sensitive search, while ora:contains
specifies a case-insensitive search. To ensure that the ora:contains
and the rewritten contains
have the same semantics, the ora:contains
policy must exactly match the index choices of the CONTEXT
index.
Both the ora:contains
policy and the CONTEXT
index must also use the NULL_SECTION_GROUP
section group type. The default section group for an ora:contains
policy is ctxsys.NULL_SECTION_GROUP
.
Finally, the CONTEXT
index is generally asynchronous. If you add a new document that contains the word "dog
", but do not synchronize the CONTEXT
index, then a contains
query for "dog
" does not return that document. But an ora:contains
query against the same data does. To ensure that the ora:contains
and the rewritten contains
always return the same results, build the CONTEXT
index with the TRANSACTIONAL
keyword in the PARAMETERS
string.
See Also:
Oracle Text Reference for information about creating aCONTEXT
index that is transactional using ALTER INDEX
with parameter TRANSACTIONAL
A query with XMLQuery
, XMLTable
or XMLExists
, where the XPath includes ora:contains
, can be considered for XPath rewrite if the ora:contains
policy exactly matches the index choices of the CONTEXT
index and if either of these conditions is true:
The XML data is stored object-relationally; the first ora:contains
argument (input_text
) is either a single text node whose parent node maps to a single relational column or an attribute that maps to a single relational column; there is a transactional CONTEXT
index on that column.
The XML data is binary XML that is indexed by an XMLIndex
index, and there is a CONTEXT
index on either the path-table VALUE
column of an unstructured XMLIndex
component or a scalar-value column of a structured XMLIndex
component.
If the CONTEXT
index is non-transactional then you must also use XQuery extension-expression pragma ora:use_text_index
, to force the use of the CONTEXT
index. Example 12-51 illustrates this.
Example 12-51 Using XQuery Pragma ora:use_text_index with ora:contains
CREATE INDEX po_otext_ix ON my_path_table (VALUE) INDEXTYPE IS CTXSYS.CONTEXT; EXPLAIN PLAN FOR SELECT DISTINCT XMLCast(XMLQuery('$p/PurchaseOrder/ShippingInstructions/address' PASSING po.OBJECT_VALUE AS "p" RETURNING CONTENT) AS VARCHAR2(256)) "Address" FROM po_binxml po WHERE XMLExists( '$p/PurchaseOrder/ShippingInstructions/address [(# ora:use_text_index #) {ora:contains(., "$(Fortieth)")} > 0]' PASSING po.OBJECT_VALUE AS "p"); ---------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes|Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3046 | 12 (17)|00:00:01| | 1 | SORT GROUP BY | | 1 | 3524 | | | |* 2 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_PATH_TABLE | 2 | 7048 | 3 (0)|00:00:01| |* 3 | INDEX RANGE SCAN | SYS89559_PO_XMLINDE_PIKEY_IX | 1 | | 2 (0)|00:00:01| | 4 | HASH UNIQUE | | 1 | 3046 | 12 (17)|00:00:01| | 5 | NESTED LOOPS | | 1 | 3046 | 8 (13)|00:00:01| | 6 | SORT UNIQUE | | 1 | 3034 | 6 (0)|00:00:01| |* 7 | TABLE ACCESS BY INDEX ROWID | MY_PATH_TABLE | 1 | 3034 | 6 (0)|00:00:01| |* 8 | DOMAIN INDEX | PO_OTEXT_IX | | | 4 (0)|00:00:01| | 9 | TABLE ACCESS BY USER ROWID | PO_BINXML | 1 | 12 | 1 (0)|00:00:01| ---------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(SYS_XMLI_LOC_ISNODE("SYS_P3"."LOCATOR")=1) 3 - access("SYS_P3"."RID"=:B1 AND "SYS_P3"."PATHID"=HEXTORAW('6F7F')) 7 - filter("SYS_P1"."PATHID"=HEXTORAW('6F7F') AND SYS_XMLI_LOC_ISNODE("SYS_P1"."LOCATOR")=1) 8 - access("CTXSYS"."CONTAINS"("SYS_P1"."VALUE",'$(Fortieth)')>0) Note ----- - dynamic sampling used for this statement (level=2) 28 rows selected.
HasPathArg ::= LocationPath | EqualityExpr InPathArg ::= LocationPath LocationPath ::= RelativeLocationPath | AbsoluteLocationPath AbsoluteLocationPath ::= ("/" RelativeLocationPath) | ("//" RelativeLocationPath) RelativeLocationPath ::= Step | (RelativeLocationPath "/" Step) | (RelativeLocationPath "//" Step) Step ::= ("@" NCName) | NCName | (NCName Predicate) | Dot | "*" Predicate ::= ("[" OrExp "]") | ("[" Digit+ "]") OrExpr ::= AndExpr | (OrExpr "or" AndExpr) AndExpr ::= BooleanExpr | (AndExpr "and" BooleanExpr) BooleanExpr ::= RelativeLocationPath | EqualityExpr | ("(" OrExpr ")") | ("not" "(" OrExpr ")") EqualityExpr ::= (RelativeLocationPath "=" Literal) | (Literal "=" RelativeLocationPath) | (RelativeLocationPath "=" Literal) | (Literal "!=" RelativeLocationPath) | (RelativeLocationPath "=" Literal) | (Literal "!=" RelativeLocationPath) Literal ::= (DoubleQuote [~"]* DoubleQuote) | (SingleQuote [~']* SingleQuote) NCName ::= (Letter | Underscore) NCNameChar* NCNameChar ::= Letter | Digit | Dot | Dash | Underscore Letter ::= ([a-z] | [A-Z]) Digit ::= [0-9] Dot ::= "." Dash ::= "-" Underscore ::= "_"
This section contains these topics:
Example 12-52 Purchase Order XML Document, po001.xml
<?xml version="1.0" encoding="UTF-8"?> <purchaseOrder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="xmlschema/po.xsd" orderDate="1999-10-20"> <shipTo country="US"> <name>Alice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </shipTo> <billTo country="US"> <name>Robert Smith</name> <street>8 Oak Avenue</street> <city>Old Town</city> <state>PA</state> <zip>95819</zip> </billTo> <comment>Hurry, my lawn is going wild!</comment> <items> <item partNum="872-AA"> <productName>Lawnmower</productName> <quantity>1</quantity> <USPrice>148.95</USPrice> <comment>Confirm this is electric</comment> </item> <item partNum="926-AA"> <productName>Baby Monitor</productName> <quantity>1</quantity> <USPrice>39.98</USPrice> <shipDate>1999-05-21</shipDate> </item> </items> </purchaseOrder>
Example 12-53 Create Table PURCHASE_ORDERS
CREATE TABLE purchase_orders (id NUMBER, doc VARCHAR2(4000)); INSERT INTO purchase_orders (id, doc) VALUES (1, '<?xml version="1.0" encoding="UTF-8"?> <purchaseOrder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="xmlschema/po.xsd" orderDate="1999-10-20"> <shipTo country="US"> <name>Alice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </shipTo> <billTo country="US"> <name>Robert Smith</name> <street>8 Oak Avenue</street> <city>Old Town</city> <state>PA</state> <zip>95819</zip> </billTo> <comment>Hurry, my lawn is going wild!</comment> <items> <item partNum="872-AA"> <productName>Lawnmower</productName> <quantity>1</quantity> <USPrice>148.95</USPrice> <comment>Confirm this is electric</comment> </item> <item partNum="926-AA"> <productName>Baby Monitor</productName> <quantity>1</quantity> <USPrice>39.98</USPrice> <shipDate>1999-05-21</shipDate> </item> </items> </purchaseOrder>'); COMMIT;
Example 12-54 Create Table PURCHASE_ORDERS_XMLTYPE
CREATE TABLE purchase_orders_xmltype (id NUMBER, doc XMLType); INSERT INTO purchase_orders_xmltype (id, doc) VALUES (1, XMLTYPE ('<?xml version="1.0" encoding="UTF-8"?> <purchaseOrder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="po.xsd" orderDate="1999-10-20"> <shipTo country="US"> <name>Alice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </shipTo> <billTo country="US"> <name>Robert Smith</name> <street>8 Oak Avenue</street> <city>Old Town</city> <state>PA</state> <zip>95819</zip> </billTo> <comment>Hurry, my lawn is going wild!</comment> <items> <item partNum="872-AA"> <productName>Lawnmower</productName> <quantity>1</quantity> <USPrice>148.95</USPrice> <comment>Confirm this is electric</comment> </item> <item partNum="926-AA"> <productName>Baby Monitor</productName> <quantity>1</quantity> <USPrice>39.98</USPrice> <shipDate>1999-05-21</shipDate> </item> </items> </purchaseOrder>')); COMMIT;
Example 12-55 Create Table PURCHASE_ORDERS_XMLTYPE_TABLE
CREATE TABLE purchase_orders_xmltype_table OF XMLType; INSERT INTO purchase_orders_xmltype_table VALUES ( XMLType ('<?xml version="1.0" encoding="UTF-8"?> <purchaseOrder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="xmlschema/po.xsd" orderDate="1999-10-20"> <shipTo country="US"> <name>Alice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </shipTo> <billTo country="US"> <name>Robert Smith</name> <street>8 Oak Avenue</street> <city>Old Town</city> <state>PA</state> <zip>95819</zip> </billTo> <comment>Hurry, my lawn is going wild!</comment> <items> <item partNum="872-AA"> <productName>Lawnmower</productName> <quantity>1</quantity> <USPrice>148.95</USPrice> <comment>Confirm this is electric</comment> </item> <item partNum="926-AA"> <productName>Baby Monitor</productName> <quantity>1</quantity> <USPrice>39.98</USPrice> <shipDate>1999-05-21</shipDate> </item> </items> </purchaseOrder>')); COMMIT;
Example 12-56 Purchase-Order XML Schema for Full-Text Search Examples
<?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:annotation> <xsd:documentation xml:lang="en"> Purchase order schema for Example.com. Copyright 2000 Example.com. All rights reserved. </xsd:documentation> </xsd:annotation> <xsd:element name="purchaseOrder" type="PurchaseOrderType"/> <xsd:element name="comment" type="xsd:string"/> <xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:element name="shipTo" type="USAddress"/> <xsd:element name="billTo" type="USAddress"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> </xsd:sequence> <xsd:attribute name="orderDate" type="xsd:date"/> </xsd:complexType> <xsd:complexType name="USAddress"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="street" type="xsd:string"/> <xsd:element name="city" type="xsd:string"/> <xsd:element name="state" type="xsd:string"/> <xsd:element name="zip" type="xsd:decimal"/> </xsd:sequence> <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/> </xsd:complexType> <xsd:complexType name="Items"> <xsd:sequence> <xsd:element name="item" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="productName" type="xsd:string"/> <xsd:element name="quantity"> <xsd:simpleType> <xsd:restriction base="xsd:positiveInteger"> <xsd:maxExclusive value="100"/> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="USPrice" type="xsd:decimal"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/> </xsd:sequence> <xsd:attribute name="partNum" type="SKU" use="required"/> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> <!-- Stock Keeping Unit, a code for identifying products --> <xsd:simpleType name="SKU"> <xsd:restriction base="xsd:string"> <xsd:pattern value="\d{3}-[A-Z]{2}"/> </xsd:restriction> </xsd:simpleType> </xsd:schema>