# Print output for @column tags ?> Collator - Android SDK | Android Developers

Most visited

Recently visited


public abstract class Collator
extends Object implements Comparator<Object>, Freezable<Collator>, Cloneable

   ↳ android.icu.text.Collator

[icu enhancement] ICU's replacement for Collator. Methods, fields, and other functionality specific to ICU are labeled '[icu]'.

Collator performs locale-sensitive string comparison. A concrete subclass, RuleBasedCollator, allows customization of the collation ordering by the use of rule sets.

A Collator is thread-safe only when frozen. See isFrozen() and Freezable.

Following the Unicode Consortium's specifications for the Unicode Collation Algorithm (UCA), there are 5 different levels of strength used in comparisons:

  • PRIMARY strength: Typically, this is used to denote differences between base characters (for example, "a" < "b"). It is the strongest difference. For example, dictionaries are divided into different sections by base character.
  • SECONDARY strength: Accents in the characters are considered secondary differences (for example, "as" < "às" < "at"). Other differences between letters can also be considered secondary differences, depending on the language. A secondary difference is ignored when there is a primary difference anywhere in the strings.
  • TERTIARY strength: Upper and lower case differences in characters are distinguished at tertiary strength (for example, "ao" < "Ao" < "aò"). In addition, a variant of a letter differs from the base form on the tertiary strength (such as "A" and "Ⓐ"). Another example is the difference between large and small Kana. A tertiary difference is ignored when there is a primary or secondary difference anywhere in the strings.
  • QUATERNARY strength: When punctuation is ignored (see Ignoring Punctuations in the User Guide) at PRIMARY to TERTIARY strength, an additional strength level can be used to distinguish words with and without punctuation (for example, "ab" < "a-b" < "aB"). This difference is ignored when there is a PRIMARY, SECONDARY or TERTIARY difference. The QUATERNARY strength should only be used if ignoring punctuation is required.
  • IDENTICAL strength: When all other strengths are equal, the IDENTICAL strength is used as a tiebreaker. The Unicode code point values of the NFD form of each string are compared, just in case there is no difference. For example, Hebrew cantellation marks are only distinguished at this strength. This strength should be used sparingly, as only code point value differences between two strings is an extremely rare occurrence. Using this strength substantially decreases the performance for both comparison and collation key generation APIs. This strength also increases the size of the collation key.
Unlike the JDK, ICU4J's Collator deals only with 2 decomposition modes, the canonical decomposition mode and one that does not use any decomposition. The compatibility decomposition mode, java.text.Collator.FULL_DECOMPOSITION is not supported here. If the canonical decomposition mode is set, the Collator handles un-normalized text properly, producing the same results as if the text were normalized in NFD. If canonical decomposition is turned off, it is the user's responsibility to ensure that all text is already in the appropriate form before performing a comparison or before getting a CollationKey.

For more information about the collation service see the User Guide.

Examples of use

 // Get the Collator for US English and set its strength to PRIMARY
 Collator usCollator = Collator.getInstance(Locale.US);
 if (usCollator.compare("abc", "ABC") == 0) {
     System.out.println("Strings are equivalent");

 The following example shows how to compare two strings using the
 Collator for the default locale.

 // Compare two strings in the default locale
 Collator myCollator = Collator.getInstance();
 if (myCollator.compare("à\u0325", "a\u0325̀") != 0) {
     System.out.println("à\u0325 is not equals to a\u0325̀ without decomposition");
     if (myCollator.compare("à\u0325", "a\u0325̀") != 0) {
         System.out.println("Error: à\u0325 should be equals to a\u0325̀ with decomposition");
     else {
         System.out.println("à\u0325 is equals to a\u0325̀ with decomposition");
 else {
     System.out.println("Error: à\u0325 should be not equals to a\u0325̀ without decomposition");

See also:


Nested classes

interface Collator.ReorderCodes

Reordering codes for non-script groups that can be reordered under collation. 



Decomposition mode value.


[icu] Note: This is for backwards compatibility with Java APIs only.


Smallest Collator strength value.


Decomposition mode value.


Strongest collator strength value.


[icu] Fourth level collator strength value.


Second level collator strength value.


Third level collator strength value.

Protected constructors


Empty default constructor to make javadocs happy

Public methods

Object clone()

Clones the collator.

Collator cloneAsThawed()

Provides for the clone operation.

int compare(Object source, Object target)

Compares the source Object to the target Object.

abstract int compare(String source, String target)

Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode.

boolean equals(String source, String target)

Compares the equality of two text Strings using this Collator's rules, strength and decomposition mode.

boolean equals(Object obj)

Compares the equality of two Collator objects.

Collator freeze()

Freezes the collator.

static Locale[] getAvailableLocales()

Returns the set of locales, as Locale objects, for which collators are installed.

static final ULocale[] getAvailableULocales()

[icu] Returns the set of locales, as ULocale objects, for which collators are installed.

abstract CollationKey getCollationKey(String source)

Transforms the String into a CollationKey suitable for efficient repeated comparison.

int getDecomposition()

Returns the decomposition mode of this Collator.

static String getDisplayName(Locale objectLocale, Locale displayLocale)

[icu] Returns the name of the collator for the objectLocale, localized for the displayLocale.

static String getDisplayName(ULocale objectLocale)

[icu] Returns the name of the collator for the objectLocale, localized for the default DISPLAY locale.

static String getDisplayName(ULocale objectLocale, ULocale displayLocale)

[icu] Returns the name of the collator for the objectLocale, localized for the displayLocale.

static String getDisplayName(Locale objectLocale)

[icu] Returns the name of the collator for the objectLocale, localized for the default DISPLAY locale.

static int[] getEquivalentReorderCodes(int reorderCode)

Retrieves all the reorder codes that are grouped with the given reorder code.

static final ULocale getFunctionalEquivalent(String keyword, ULocale locID)

[icu] Returns the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service.

static final ULocale getFunctionalEquivalent(String keyword, ULocale locID, boolean[] isAvailable)

[icu] Returns the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service.

static final Collator getInstance()

Returns the Collator for the current default locale.

static final Collator getInstance(Locale locale)

Returns the Collator for the desired locale.

static final Collator getInstance(ULocale locale)

[icu] Returns the Collator for the desired locale.

static final String[] getKeywordValues(String keyword)

[icu] Given a keyword, returns an array of all values for that keyword that are currently in use.

static final String[] getKeywordValuesForLocale(String key, ULocale locale, boolean commonlyUsed)

[icu] Given a key and a locale, returns an array of string values in a preferred order that would make a difference.

static final String[] getKeywords()

[icu] Returns an array of all possible keywords that are relevant to collation.

int getMaxVariable()

[icu] Returns the maximum reordering group whose characters are affected by the alternate handling behavior.

int[] getReorderCodes()

Retrieves the reordering codes for this collator.

int getStrength()

Returns this Collator's strength attribute.

UnicodeSet getTailoredSet()

[icu] Returns a UnicodeSet that contains all the characters and sequences tailored in this collator.

abstract VersionInfo getUCAVersion()

[icu] Returns the UCA version of this collator object.

abstract int getVariableTop()

[icu] Gets the variable top value of a Collator.

abstract VersionInfo getVersion()

[icu] Returns the version of this collator object.

int hashCode()

Generates a hash code for this Collator object.

boolean isFrozen()

Determines whether the object has been frozen or not.

void setDecomposition(int decomposition)

Sets the decomposition mode of this Collator.

Collator setMaxVariable(int group)

[icu] Sets the variable top to the top of the specified reordering group.

void setReorderCodes(int... order)

Sets the reordering codes for this collator.

void setStrength(int newStrength)

Sets this Collator's strength attribute.

Inherited methods



public static final int CANONICAL_DECOMPOSITION

Decomposition mode value. With CANONICAL_DECOMPOSITION set, characters that are canonical variants according to the Unicode standard will be decomposed for collation.

CANONICAL_DECOMPOSITION corresponds to Normalization Form D as described in Unicode Technical Report #15.

See also:

Constant Value: 17 (0x00000011)


public static final int FULL_DECOMPOSITION

[icu] Note: This is for backwards compatibility with Java APIs only. It should not be used, IDENTICAL should be used instead. ICU's collation does not support Java's FULL_DECOMPOSITION mode.

Constant Value: 15 (0x0000000f)


public static final int IDENTICAL

Smallest Collator strength value. When all other strengths are equal, the IDENTICAL strength is used as a tiebreaker. The Unicode code point values of the NFD form of each string are compared, just in case there is no difference. See class documentation for more explanation.

Note this value is different from JDK's

Constant Value: 15 (0x0000000f)


public static final int NO_DECOMPOSITION

Decomposition mode value. With NO_DECOMPOSITION set, Strings will not be decomposed for collation. This is the default decomposition setting unless otherwise specified by the locale used to create the Collator.

Note this value is different from the JDK's.

See also:

Constant Value: 16 (0x00000010)


public static final int PRIMARY

Strongest collator strength value. Typically used to denote differences between base characters. See class documentation for more explanation.

See also:

Constant Value: 0 (0x00000000)


public static final int QUATERNARY

[icu] Fourth level collator strength value. When punctuation is ignored (see Ignoring Punctuation in the User Guide) at PRIMARY to TERTIARY strength, an additional strength level can be used to distinguish words with and without punctuation. See class documentation for more explanation.

See also:

Constant Value: 3 (0x00000003)


public static final int SECONDARY

Second level collator strength value. Accents in the characters are considered secondary differences. Other differences between letters can also be considered secondary differences, depending on the language. See class documentation for more explanation.

See also:

Constant Value: 1 (0x00000001)


public static final int TERTIARY

Third level collator strength value. Upper and lower case differences in characters are distinguished at this strength level. In addition, a variant of a letter differs from the base form on the tertiary level. See class documentation for more explanation.

See also:

Constant Value: 2 (0x00000002)

Protected constructors


protected Collator ()

Empty default constructor to make javadocs happy

Public methods


public Object clone ()

Clones the collator.

Object a clone of this collator.



public Collator cloneAsThawed ()

Provides for the clone operation. Any clone is initially unfrozen.



public int compare (Object source, 
                Object target)

Compares the source Object to the target Object.

source Object: the source Object.

target Object: the target Object.

int Returns an integer value. Value is less than zero if source is less than target, value is zero if source and target are equal, value is greater than zero if source is greater than target.

ClassCastException thrown if either arguments cannot be cast to CharSequence.


public abstract int compare (String source, 
                String target)

Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode. Returns an integer less than, equal to or greater than zero depending on whether the source String is less than, equal to or greater than the target String. See the Collator class description for an example of use.

source String: the source String.

target String: the target String.

int Returns an integer value. Value is less than zero if source is less than target, value is zero if source and target are equal, value is greater than zero if source is greater than target.

NullPointerException thrown if either argument is null.

See also:


public boolean equals (String source, 
                String target)

Compares the equality of two text Strings using this Collator's rules, strength and decomposition mode. Convenience method.

source String: the source string to be compared.

target String: the target string to be compared.

boolean true if the strings are equal according to the collation rules, otherwise false.

NullPointerException thrown if either arguments is null.

See also:


public boolean equals (Object obj)

Compares the equality of two Collator objects. Collator objects are equal if they have the same collation (sorting & searching) behavior.

The base class checks for null and for equal types. Subclasses should override.

obj Object: the Collator to compare to.

boolean true if this Collator has exactly the same collation behavior as obj, false otherwise.


public Collator freeze ()

Freezes the collator.

Collator the collator itself.


public static Locale[] getAvailableLocales ()

Returns the set of locales, as Locale objects, for which collators are installed. Note that Locale objects do not support RFC 3066.

Locale[] the list of locales in which collators are installed. This list includes any that have been registered, in addition to those that are installed with ICU4J.


public static final ULocale[] getAvailableULocales ()

[icu] Returns the set of locales, as ULocale objects, for which collators are installed. ULocale objects support RFC 3066.

ULocale[] the list of locales in which collators are installed. This list includes any that have been registered, in addition to those that are installed with ICU4J.


public abstract CollationKey getCollationKey (String source)

Transforms the String into a CollationKey suitable for efficient repeated comparison. The resulting key depends on the collator's rules, strength and decomposition mode.

Note that collation keys are often less efficient than simply doing comparison. For more details, see the ICU User Guide.

See the CollationKey class documentation for more information.

source String: the string to be transformed into a CollationKey.

CollationKey the CollationKey for the given String based on this Collator's collation rules. If the source String is null, a null CollationKey is returned.

See also:


public int getDecomposition ()

Returns the decomposition mode of this Collator. The decomposition mode determines how Unicode composed characters are handled.

See the Collator class description for more details.

The base class method always returns NO_DECOMPOSITION. Subclasses should override it if appropriate.

int the decomposition mode

See also:


public static String getDisplayName (Locale objectLocale, 
                Locale displayLocale)

[icu] Returns the name of the collator for the objectLocale, localized for the displayLocale.

objectLocale Locale: the locale of the collator

displayLocale Locale: the locale for the collator's display name

String the display name


public static String getDisplayName (ULocale objectLocale)

[icu] Returns the name of the collator for the objectLocale, localized for the default DISPLAY locale.

objectLocale ULocale: the locale of the collator

String the display name

See also:


public static String getDisplayName (ULocale objectLocale, 
                ULocale displayLocale)

[icu] Returns the name of the collator for the objectLocale, localized for the displayLocale.

objectLocale ULocale: the locale of the collator

displayLocale ULocale: the locale for the collator's display name

String the display name


public static String getDisplayName (Locale objectLocale)

[icu] Returns the name of the collator for the objectLocale, localized for the default DISPLAY locale.

objectLocale Locale: the locale of the collator

String the display name

See also:


public static int[] getEquivalentReorderCodes (int reorderCode)

Retrieves all the reorder codes that are grouped with the given reorder code. Some reorder codes are grouped and must reorder together. Beginning with ICU 55, scripts only reorder together if they are primary-equal, for example Hiragana and Katakana.

reorderCode int: The reorder code to determine equivalence for.

int[] the set of all reorder codes in the same group as the given reorder code.

See also:


public static final ULocale getFunctionalEquivalent (String keyword, 
                ULocale locID)

[icu] Returns the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service.

keyword String: a particular keyword as enumerated by getKeywords.

locID ULocale: The requested locale

ULocale the locale

See also:


public static final ULocale getFunctionalEquivalent (String keyword, 
                ULocale locID, 
                boolean[] isAvailable)

[icu] Returns the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service. If two locales return the same result, then collators instantiated for these locales will behave equivalently. The converse is not always true; two collators may in fact be equivalent, but return different results, due to internal details. The return result has no other meaning than that stated above, and implies nothing as to the relationship between the two locales. This is intended for use by applications who wish to cache collators, or otherwise reuse collators when possible. The functional equivalent may change over time. For more information, please see the Locales and Services section of the ICU User Guide.

keyword String: a particular keyword as enumerated by getKeywords.

locID ULocale: The requested locale

isAvailable boolean: If non-null, isAvailable[0] will receive and output boolean that indicates whether the requested locale was 'available' to the collation service. If non-null, isAvailable must have length >= 1.

ULocale the locale


public static final Collator getInstance ()

Returns the Collator for the current default locale. The default locale is determined by java.util.Locale.getDefault().

Collator the Collator for the default locale (for example, en_US) if it is created successfully. Otherwise if there is no Collator associated with the current locale, the root collator will be returned.

See also:


public static final Collator getInstance (Locale locale)

Returns the Collator for the desired locale.

For some languages, multiple collation types are available; for example, "de-u-co-phonebk". Starting with ICU 54, collation attributes can be specified via locale keywords as well, in the old locale extension syntax ("el@colCaseFirst=upper", only with ULocale) or in language tag syntax ("el-u-kf-upper"). See User Guide: Collation API.

locale Locale: the desired locale.

Collator Collator for the desired locale if it is created successfully. Otherwise if there is no Collator associated with the current locale, the root collator will be returned.

See also:


public static final Collator getInstance (ULocale locale)

[icu] Returns the Collator for the desired locale.

For some languages, multiple collation types are available; for example, "de@collation=phonebook". Starting with ICU 54, collation attributes can be specified via locale keywords as well, in the old locale extension syntax ("el@colCaseFirst=upper") or in language tag syntax ("el-u-kf-upper"). See User Guide: Collation API.

locale ULocale: the desired locale.

Collator Collator for the desired locale if it is created successfully. Otherwise if there is no Collator associated with the current locale, the root collator will be returned.

See also:


public static final String[] getKeywordValues (String keyword)

[icu] Given a keyword, returns an array of all values for that keyword that are currently in use.

keyword String: one of the keywords returned by getKeywords.


See also:


public static final String[] getKeywordValuesForLocale (String key, 
                ULocale locale, 
                boolean commonlyUsed)

[icu] Given a key and a locale, returns an array of string values in a preferred order that would make a difference. These are all and only those values where the open (creation) of the service with the locale formed from the input locale plus input keyword and that value has different behavior than creation with the input locale alone.

key String: one of the keys supported by this service. For now, only "collation" is supported.

locale ULocale: the locale

commonlyUsed boolean: if set to true it will return only commonly used values with the given locale in preferred order. Otherwise, it will return all the available values for the locale.

String[] an array of string values for the given key and the locale.


public static final String[] getKeywords ()

[icu] Returns an array of all possible keywords that are relevant to collation. At this point, the only recognized keyword for this service is "collation".

String[] an array of valid collation keywords.

See also:


public int getMaxVariable ()

[icu] Returns the maximum reordering group whose characters are affected by the alternate handling behavior.

The base class implementation returns Collator.ReorderCodes.PUNCTUATION.

int the maximum variable reordering group.

See also:


public int[] getReorderCodes ()

Retrieves the reordering codes for this collator. These reordering codes are a combination of UScript codes and ReorderCodes.

int[] a copy of the reordering codes for this collator; if none are set then returns an empty array

See also:


public int getStrength ()

Returns this Collator's strength attribute. The strength attribute determines the minimum level of difference considered significant. [icu] Note: This can return QUATERNARY strength, which is not supported by the JDK version.

See the Collator class description for more details.

The base class method always returns TERTIARY. Subclasses should override it if appropriate.

int this Collator's current strength attribute.

See also:


public UnicodeSet getTailoredSet ()

[icu] Returns a UnicodeSet that contains all the characters and sequences tailored in this collator.

UnicodeSet a pointer to a UnicodeSet object containing all the code points and sequences that may sort differently than in the root collator.


public abstract VersionInfo getUCAVersion ()

[icu] Returns the UCA version of this collator object.

VersionInfo the version object associated with this collator


public abstract int getVariableTop ()

[icu] Gets the variable top value of a Collator.

int the variable top primary weight

See also:


public abstract VersionInfo getVersion ()

[icu] Returns the version of this collator object.

VersionInfo the version object associated with this collator


public int hashCode ()

Generates a hash code for this Collator object.

The implementation exists just for consistency with equals(java.lang.Object) implementation in this class and does not generate a useful hash code. Subclasses should override this implementation.

int a hash code value.


public boolean isFrozen ()

Determines whether the object has been frozen or not.

An unfrozen Collator is mutable and not thread-safe. A frozen Collator is immutable and thread-safe.



public void setDecomposition (int decomposition)

Sets the decomposition mode of this Collator. Setting this decomposition attribute with CANONICAL_DECOMPOSITION allows the Collator to handle un-normalized text properly, producing the same results as if the text were normalized. If NO_DECOMPOSITION is set, it is the user's responsibility to insure that all text is already in the appropriate form before a comparison or before getting a CollationKey. Adjusting decomposition mode allows the user to select between faster and more complete collation behavior.

Since a great many of the world's languages do not require text normalization, most locales set NO_DECOMPOSITION as the default decomposition mode.

The base class method does nothing. Subclasses should override it if appropriate.

See getDecomposition for a description of decomposition mode.

decomposition int: the new decomposition mode

IllegalArgumentException If the given value is not a valid decomposition mode.

See also:


public Collator setMaxVariable (int group)

[icu] Sets the variable top to the top of the specified reordering group. The variable top determines the highest-sorting character which is affected by the alternate handling behavior. If that attribute is set to UCOL_NON_IGNORABLE, then the variable top has no effect.

The base class implementation throws an UnsupportedOperationException.

group int: one of Collator.ReorderCodes.SPACE, Collator.ReorderCodes.PUNCTUATION, Collator.ReorderCodes.SYMBOL, Collator.ReorderCodes.CURRENCY; or Collator.ReorderCodes.DEFAULT to restore the default max variable group

Collator this

See also:


public void setReorderCodes (int... order)

Sets the reordering codes for this collator. Collation reordering allows scripts and some other groups of characters to be moved relative to each other. This reordering is done on top of the DUCET/CLDR standard collation order. Reordering can specify groups to be placed at the start and/or the end of the collation order. These groups are specified using UScript codes and Collator.ReorderCodes entries.

By default, reordering codes specified for the start of the order are placed in the order given after several special non-script blocks. These special groups of characters are space, punctuation, symbol, currency, and digit. These special groups are represented with Collator.ReorderCodes entries. Script groups can be intermingled with these special non-script groups if those special groups are explicitly specified in the reordering.

The special code OTHERS stands for any script that is not explicitly mentioned in the list of reordering codes given. Anything that is after OTHERS will go at the very end of the reordering in the order given.

The special reorder code DEFAULT will reset the reordering for this collator to the default for this collator. The default reordering may be the DUCET/CLDR order or may be a reordering that was specified when this collator was created from resource data or from rules. The DEFAULT code must be the sole code supplied when it is used. If not, then an IllegalArgumentException will be thrown.

The special reorder code NONE will remove any reordering for this collator. The result of setting no reordering will be to have the DUCET/CLDR ordering used. The NONE code must be the sole code supplied when it is used.

order int: the reordering codes to apply to this collator; if this is null or an empty array then this clears any existing reordering

See also:


public void setStrength (int newStrength)

Sets this Collator's strength attribute. The strength attribute determines the minimum level of difference considered significant during comparison.

The base class method does nothing. Subclasses should override it if appropriate.

See the Collator class description for an example of use.

newStrength int: the new strength value.

IllegalArgumentException if the new strength value is not valid.

See also: