본문 바로가기

Oracle/SQL Query

Oracle Regular Expression

Table 4-1 SQL Regular Expression Functions and Conditions

SQL Element Category Description
REGEXP_LIKE
Condition

Searches a character column for a pattern. Use this function in the WHERE clause of a query to return rows matching a regular expression. The condition is also valid in a constraint or as a PL/SQL function returning a boolean. The following WHERE clause filters employees with a first name of Steven or Stephen:

WHERE REGEXP_LIKE(first_name, '^Ste(v|ph)en$')
REGEXP_REPLACE
Function

Searches for a pattern in a character column and replaces each occurrence of that pattern with the specified string. The following function puts a space after each character in the country_name column:

REGEXP_REPLACE(country_name, '(.)', '\1 ')
REGEXP_INSTR 
Function 

Searches a string for a given occurrence of a regular expression pattern and returns an integer indicating the position in the string where the match is found. You specify which occurrence you want to find and the start position. For example, the following performs a boolean test for a valid email address in the email column:

REGEXP_INSTR(email, '\w+@\w+(\.\w+)+') > 0
REGEXP_SUBSTR
Function

Returns the substring matching the regular expression pattern that you specify. The following function uses the x flag to match the first string by ignoring spaces in the regular expression:

REGEXP_SUBSTR('oracle', 'o r a c l e', 1, 1, 'x')



Table 4-2 POSIX Metacharacters in Oracle Database Regular Expressions

Syntax Operator Name Description Example

.

Any Character — Dot

Matches any character in the database character set. If the n flag is set, it matches the newline character. The newline is recognized as the linefeed character (\x0a) on UNIX and Windows or the carriage return character (\x0d) on Macintosh platforms.

Note: In the POSIX standard, this operator matches any English character except NULL and the newline character.

The expression a.b matches the strings abb, acb, and adb, but does not match acc.

+

One or More — Plus Quantifier

Matches one or more occurrences of the preceding subexpression.

The expression a+ matches the strings a, aa, and aaa, but does not match bbb.

?

Zero or One — Question Mark Quantifier

Matches zero or one occurrence of the preceding subexpression.

The expression ab?c matches the strings abc and ac, but does not match abbc.

*

Zero or More — Star Quantifier

Matches zero or more occurrences of the preceding subexpression. By default, a quantifier match is greedy because it matches as many times as possible while still allowing the rest of the match to succeed.

The expression ab*c matches the strings ac, abc, and abbc, but does not match abb.

{m}

Interval—Exact Count

Matches exactly m occurrences of the preceding subexpression.

The expression a{3} matches the strings aaa, but does not match aa.

{m,}

Interval—At Least Count

Matches at least m occurrences of the preceding subexpression.

The expression a{3,} matches the strings aaa and aaaa, but does not match aa.

{m,n}

Interval—Between Count

Matches at least m, but not more than n occurrences of the preceding subexpression.

The expression a{3,5} matches the strings aaa, aaaa, and aaaaa, but does not match aa.

[ ... ]

Matching Character List

Matches any single character in the list within the brackets. The following operators are allowed within the list, but other metacharacters included are treated as literals:

  • Range operator: -

  • POSIX character class: [: :]

  • POSIX collation element: [. .]

  • POSIX character equivalence class: [= =]

A dash (-) is a literal when it occurs first or last in the list, or as an ending range point in a range expression, as in [#--]. A right bracket (]) is treated as a literal if it occurs first in the list.

Note: In the POSIX standard, a range includes all collation elements between the start and end of the range in the linguistic definition of the current locale. Thus, ranges are linguistic rather than byte values ranges; the semantics of the range expression are independent of character set. In Oracle Database, the linguistic range is determined by the NLS_SORT initialization parameter.

The expression [abc] matches the first character in the strings all, bill, and cold, but does not match any characters in doll.

[^ ... ]

Non-Matching Character List

Matches any single character not in the list within the brackets. Characters not in the non-matching character list are returned as a match. Refer to the description of the Matching Character List operator for an account of metacharacters allowed in the character list.

The expression [^abc] matches the character d in the string abcdef, but not the character a, b, or c. The expression [^abc]+ matches the sequence def in the string abcdef, but not a, b, or c.

The expression [^a-i] excludes any character between a and i from the search result. This expression matches the character j in the string hij, but does not match any characters in the string abcdefghi.

|

Or

Matches one of the alternatives.

The expression a|b matches character a or character b.

( ... )

Subexpression or Grouping

Treats the expression within parentheses as a unit. The subexpression can be a string of literals or a complex expression containing operators.

The expression (abc)?def matches the optional string abc, followed by def. Thus, the expression matches abcdefghi and def, but does not match ghi.

\n

Backreference

Matches the nth preceding subexpression, that is, whatever is grouped within parentheses, where n is an integer from 1 to 9. The parentheses cause an expression to be remembered; a backreference refers to it. A backreference counts subexpressions from left to right, starting with the opening parenthesis of each preceding subexpression. The expression is invalid if the source string contains fewer than n subexpressions preceding the \n.

Oracle supports the backreference expression in the regular expression pattern and the replacement string of the REGEXP_REPLACE function.

The expression (abc|def)xy\1 matches the strings abcxyabc and defxydef, but does not match abcxydef or abcxy.

A backreference enables you to search for a repeated string without knowing the actual string ahead of time. For example, the expression ^(.*)\1$ matches a line consisting of two adjacent instances of the same string.

\

Escape Character

Treats the subsequent metacharacter in the expression as a literal. Use a backslash (\) to search for a character that is normally treated as a metacharacter. Use consecutive backslashes (\\) to match the backslash literal itself.

The expression \+ searches for the plus character (+). It matches the plus character in the string abc+def, but does not match abcdef.

^

Beginning of Line Anchor

Matches the beginning of a string (default). In multiline mode, it matches the beginning of any line within the source string.

The expression ^def matches def in the string defghi but does not match def in abcdef.

$

End of Line Anchor

Matches the end of a string (default). In multiline mode, it matches the beginning of any line within the source string.

The expression def$ matches def in the string abcdef but does not match def in the string defghi.

[:class:]

POSIX Character Class

Matches any character belonging to the specified POSIX character class. You can use this operator to search for characters with specific formatting such as uppercase characters, or you can search for special characters such as digits or punctuation characters. The full set of POSIX character classes is supported.

Note: In English regular expressions, range expressions often indicate a character class. For example, [a-z] indicates any lowercase character. This convention is not useful in multilingual environments, where the first and last character of a given character class may not be the same in all languages. Oracle supports the character classes in Table 4-3 based on character class definitions in Globalization classification data.

The expression [[:upper:]]+ searches for one or more consecutive uppercase characters. This expression matches DEF in the string abcDEFghi but does not match the string abcdefghi.

[.element.]

POSIX Collating Element Operator

Specifies a collating element to use in the regular expression. The element must be a defined collating element in the current locale. Use any collating element defined in the locale, including single-character and multicharacter elements. The NLS_SORT initialization parameter determines supported collation elements.This operator lets you use a multicharacter collating element in cases where only one character would otherwise be allowed. For example, you can ensure that the collating element ch, when defined in a locale such as Traditional Spanish, is treated as one character in operations that depend on the ordering of characters.

The expression [[.ch.]] searches for the collating element ch and matches ch in string chabc, but does not match cdefg. The expression [a-[.ch.]] specifies the range a to ch.

[=character=]

POSIX Character Equivalence Class

Matches all characters that are members of the same character equivalence class in the current locale as the specified character.

The character equivalence class must occur within a character list, so the character equivalence class is always nested within the brackets for the character list in the regular expression.

Usage of character equivalents depends on how canonical rules are defined for your database locale. Refer to the Oracle Database Globalization Support Guide for more information on linguistic sorting and string searching.

The expression [[=n=]] searches for characters equivalent to n in a Spanish locale. It matches both N and ñ in the string El Niño.


See Also:

Oracle Database SQL Reference for syntax, descriptions, and examples of the REGEXP functions and conditions






참고 문서 주소 : http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14251/adfns_regexp.htm#sthref534