Start line:  
End line:  

Snippet Preview

Snippet HTML Code

Stack Overflow Questions
I am learning GoF Java Design Patterns and I want to see some real life examples of them. Can you guys point to some good usage of these Design Patterns.(preferably in Java's core libraries). Thank you
I want to match a string to make sure it contains only letters. I've got this and it works just fine: var onlyLetters = /^[a-zA-Z]$/.test(myString); BUT Since I speak another language too, I need to allow all letters, not just A-Z. Also for eg é ü ö ê å ø does anyone know if there is a global 'alpha' term that includes all letters to use with regExp? Or even better, does anyone have som...
I'm looking for java library which allow "normalization" of text. Something similar to standart Normalizer, but wider (something like utf8proc LUMP). It should replace all kind of special charachters to ASCII equivalents (if it possible of course). All variants of space to code 32, all variants of minuses (long, short, thin, etc) to code 45 and so on.
I would like to use the java.text.Normalizer class from Java 1.6 to do Unicode normalization, but my code has to be able to run on Java 1.5. I don't mind if the code running on 1.5 doesn't do normalization, but I don't want it to give NoClassDefFoundErrors or ClassNotFoundExceptions when it runs. What's the best way to achieve this?
I'm struggling with a strange file name encoding issue when listing directory contents in Java 6 on both OS X and Linux: the File.listFiles() and related methods seem to return file names in a different encoding than the rest of the system. Note that it is not merely the display of these file names that is causing me problems. I'm mainly interested in doing a comparison of file names with a re...
Throughout the vast number of unicode characters, there are some that actually represent more than one character, like the U+FB00 ligature ff for two 'f' characters. Is there any way easy to convert characters like these into multiple single characters? Preferably something available in the standard Java API, but I can refer to an external library if need be.
I am trying to decode some UTF-8 strings in Java. These strings contain some combining unicode characters, such as CC 88 (combining diaresis). The character sequence seems ok, according to http://www.fileformat.info/info/unicode/char/0308/index.htm But the output after conversion to String is invalid. Any idea ? byte[] utf8 = { 105, -52, -120 }; System.out.print("{{"); for(int i = 0; i < u...
I need to convert Strings that consists of some letters specific to certain languages (like HÄSTDJUR - note Ä) to a String without those special letters (in this case HASTDJUR). How can I do it in Java? Thanks for help! It is not really about how it sounds. The scenario is following - you want to use the application, but don't have the Swedish keyboard. So instead of looking at the character...
How to know if a string contains accents?
The method should allows only "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-" chars in URI strings. What is the best way to make nice SEO URI string?
I like to replace a certain set of characters of a string with a corresponding replacement character in an efficent way. For example: String sourceCharacters = "šđćčŠĐĆČžŽ"; String targetCharacters = "sdccSDCCzZ"; String result = replaceChars("Gračišće", sourceCharacters , targetCharacters ); Assert.equals(result,"Gracisce") == true; Is there are more efficient way than to use the replac...
I need to compare the names of European places that are written using the extended latin alphabet - there are lots of central and eastern european names that are written with characters like 'ž' and 'ü', but some people write the names just using the regular english-latin alphabet. I need a way to have my system recognise 'mšk žilina' and being the same as 'msk zilina', and similar for all the...
I am trying to write a filter function for my application that will take an input string and filter out all objects that don't match the given input in some way. The easiest way to do this would be to use String's contains method, i.e. just check if the object (the String variable in the object) contains the string specified in the filter, but this won't account for accents. The objects in qu...
Possible Duplicates: ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ --> n or Remove diacritical marks from unicode chars Java - getting rid of accents and converting them to regular letters How to change diacritic characters to non-diacritic ones How can i do this? Thanks for the help
Man, this character encoding hole just keeps on getting deeper. Sigh. Ok. Check this out: I have a java String that contains the unicode character U+9996 (that's what I get if I do codePointAt()). If I look at it in the debugger expressions panel (in eclipse) then all is well and it looks like "首". However if I print it out to the console I get simply "?". It doesn't seem to be the font that'...
In PHP I would use this: $text = "Je prends une thé chaud, s'il vous plaît"; $search = array('é','î','è'); // etc. $replace = array('e','i','e'); // etc. $text = str_replace($search, $replace, $text); But the Java String method "replace" doesn't seem to accept arrays as input. Is there a way to do this (without having to resort to a for loop to go through the array)? Please say if there's a...
What is the best way to convert a string from Unicode to ASCII without changing it's length (that is very important in my case)? Also the characters without any conversion problems must be at the same positions as in the original string. So an "Ä" must be converted to "A" and not something cryptic that has more characters. Edit: @novalis - Such symbols (for example of asian languages) should j...
Is there a way to achieve transliteration of characters between charsets in java? something similar to the unix command (or similar php function): iconv -f UTF-8 -t ASCII//TRANSLIT < some_doc.txt > new_doc.txt preferably operating on strings, not having anything to do with files I know you can can change encodings with the String constructor, but that doesn't handle transliteration ...
I've got a bug with UTF-8 normalizations: as far as I understood, there's (at least) two ways to write an 'é' in UTF-8 : CC 81 and C3 A9. [After a migration from Mac/OSX to a PC/Linux] I now have a conflict between the paths I store in my database and the actual file system structure, which prevents me from accessing correctly my files ... With the help of java.text.Normalizer, I worked out...
  /*
   * Portions Copyright 2005-2006 Sun Microsystems, Inc.  All Rights Reserved.
   * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
   *
   * This code is free software; you can redistribute it and/or modify it
   * under the terms of the GNU General Public License version 2 only, as
   * published by the Free Software Foundation.  Sun designates this
   * particular file as subject to the "Classpath" exception as provided
   * by Sun in the LICENSE file that accompanied this code.
  *
  * This code is distributed in the hope that it will be useful, but WITHOUT
  * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
  * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
  * version 2 for more details (a copy is included in the LICENSE file that
  * accompanied this code).
  *
  * You should have received a copy of the GNU General Public License version
  * 2 along with this work; if not, write to the Free Software Foundation,
  * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
  *
  * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa Clara,
  * CA 95054 USA or visit www.sun.com if you need additional information or
  * have any questions.
  */
 
 /*
  *******************************************************************************
  * (C) Copyright IBM Corp. 1996-2005 - All Rights Reserved                     *
  *                                                                             *
  * The original version of this source code and documentation is copyrighted   *
  * and owned by IBM, These materials are provided under terms of a License     *
  * Agreement between IBM and Sun. This technology is protected by multiple     *
  * US and International patents. This notice and attribution to IBM may not    *
  * to removed.                                                                 *
  *******************************************************************************
  */
 
 package java.text;
 
This class provides the method normalize which transforms Unicode text into an equivalent composed or decomposed form, allowing for easier sorting and searching of text. The normalize method supports the standard normalization forms described in Unicode Standard Annex #15 — Unicode Normalization Forms.

Characters with accents or other adornments can be encoded in several different ways in Unicode. For example, take the character A-acute. In Unicode, this can be encoded as a single character (the "composed" form):

      U+00C1    LATIN CAPITAL LETTER A WITH ACUTE
or as two separate characters (the "decomposed" form):

      U+0041    LATIN CAPITAL LETTER A
      U+0301    COMBINING ACUTE ACCENT
To a user of your program, however, both of these sequences should be treated as the same "user-level" character "A with acute accent". When you are searching or comparing text, you must ensure that these two sequences are treated as equivalent. In addition, you must handle characters with more than one accent. Sometimes the order of a character's combining accents is significant, while in other cases accent sequences in different orders are really equivalent.

Similarly, the string "ffi" can be encoded as three separate letters:

      U+0066    LATIN SMALL LETTER F
      U+0066    LATIN SMALL LETTER F
      U+0069    LATIN SMALL LETTER I
or as the single character

      U+FB03    LATIN SMALL LIGATURE FFI
The ffi ligature is not a distinct semantic character, and strictly speaking it shouldn't be in Unicode at all, but it was included for compatibility with existing character sets that already provided it. The Unicode standard identifies such characters by giving them "compatibility" decompositions into the corresponding semantic characters. When sorting and searching, you will often want to use these mappings.

The normalize method helps solve these problems by transforming text into the canonical composed and decomposed forms as shown in the first example above. In addition, you can have it perform compatibility decompositions so that you can treat compatibility characters the same as their equivalents. Finally, the normalize method rearranges accents into the proper canonical order, so that you do not have to worry about accent rearrangement on your own.

The W3C generally recommends to exchange texts in NFC. Note also that most legacy character encodings use only precomposed forms and often do not encode any combining marks by themselves. For conversion to such character encodings the Unicode text needs to be normalized to NFC. For more usage examples, see the Unicode Standard Annex.

Since:
1.6
public final class Normalizer {
   private Normalizer() {};

    
This enum provides constants of the four Unicode normalization forms that are described in Unicode Standard Annex #15 — Unicode Normalization Forms and two methods to access them.

Since:
1.6
    public static enum Form {

        
Canonical decomposition.
        NFD,

        
Canonical decomposition, followed by canonical composition.
        NFC,

        
Compatibility decomposition.
        NFKD,

        
Compatibility decomposition, followed by canonical composition.
        NFKC
    }

    
Normalize a sequence of char values. The sequence will be normalized according to the specified normalization from.

Parameters:
src The sequence of char values to normalize.
form The normalization form; one of Normalizer.Form.NFC, Normalizer.Form.NFD, Normalizer.Form.NFKC, Normalizer.Form.NFKD
Returns:
The normalized String
Throws:
java.lang.NullPointerException If src or form is null.
    public static String normalize(CharSequence srcForm form) {
        return NormalizerBase.normalize(src.toString(), form);
    }

    
Determines if the given sequence of char values is normalized.

Parameters:
src The sequence of char values to be checked.
form The normalization form; one of Normalizer.Form.NFC, Normalizer.Form.NFD, Normalizer.Form.NFKC, Normalizer.Form.NFKD
Returns:
true if the sequence of char values is normalized; false otherwise.
Throws:
java.lang.NullPointerException If src or form is null.
    public static boolean isNormalized(CharSequence srcForm form) {
        return NormalizerBase.isNormalized(src.toString(), form);
    }
New to GrepCode? Check out our FAQ X