For some perverted reason, I HAD to try to write the best propercasing algorithm on Earth. This one does all of the following (highlights bolded):
jouni heikniemi -> Jouni Heikniemi
jouni von lederhosen -> Jouni von Lederhosen
THE EYE OF THE TIGER -> The Eye of the Tiger
1250 MHZ -> 1250 MHz
RoNaLD MCDoNaLD, USa -> Ronald McDonald, USA
Enough babble, the code is up next.
// CONFIGURATION:
// The following words will always be in lower case (except in the start of the string)
static string[] lowerCaseWords = { "of", "the", "and", "or", "a", "an", "von" };
// The following prefixes will cause their next character to be uppercased
// Note: Keep the first character uppercase when defining these; all else must be in lowercase
static string[] upperCasePrefixes = { "Mc", "O'" };
// The following words will be always presented in the case they have here.
static string[] fixedCaseWords = { "USA", "NATO", "MHz" };
/// <summary>
/// Converts the given string into ProperCase.
/// </summary>
/// <param name="original">The original string, f.e. "THE EYE OF THE TIGER"</param>
/// <returns>The string converted into ProperCase, f.e. "The Eye of the Tiger"</returns>
public static string ProperCase(string original) {
if (original == null || original.Length == 0) return "";
// Run the original through the massage word-by-word
string result =
Regex.Replace(original.ToLower(), @"\b(\w+)\b", new MatchEvaluator(HandleSingleWord));
// Always uppercase the first character
return Char.ToUpper(result[0]) + (result.Length > 1 ? result.Substring(1) : "");
}
// This helper method properizes (sp?) the case of a single word (regex match)
// NOTE: The input is in all lowercase as forced by the ProperCase method.
private static string HandleSingleWord(Match m) {
string word = m.Groups[1].Value;
// Is this word defined as all-lowercase?
foreach (string lcw in lowerCaseWords)
if (word == lcw)
return word;
// Is this word defined as a fixed-case word?
foreach (string fcw in fixedCaseWords)
if (String.Compare(word, fcw, true) == 0)
return fcw;
// Ok, this is a normal word; uppercase the first letter
if (word.Length == 1)
return Char.ToUpper(word[0]).ToString();
word = Char.ToUpper(word[0]) + word.Substring(1);
// Check if this word starts with one of the uppercasing prefixes
// Note: Only one of the uppercasing prefixes is applies
foreach (string ucPrefix in upperCasePrefixes)
if (word.StartsWith(ucPrefix) && word.Length > ucPrefix.Length)
return word.Substring(0, ucPrefix.Length) +
Char.ToUpper(word[ucPrefix.Length]) +
(word.Length > ucPrefix.Length + 1
? word.Substring(ucPrefix.Length + 1)
: "");
return word;
}
Afterwards, I spotted a tiny programming error. I don't think it's going to be seen in any production application, but it can produce slightly wrong result in a certain situation. Can you spot it?
Posted by Jouni Heikniemi at October 3, 2004 10:21 PM
.net
The answer to the quiz above (think before you read!):
The error actually makes the "O'" uppercase prefix unnecessary. The regex pattern \b also matches the apostrophe, so "O'Neill" is actually handled as two different words. That doesn't really matter, since the N will get uppercased anyway (it's at the start of a word). However, if you come up with really contrived examples such as the oh-so-useful string "o'a", you'll note it's cased as "O'a", while it should by definition be "O'A". Same with "O'the" and so on.
It can be fixed by making the word split algorithm more robust - either complicate the regex or build the code on string splitting. I promise to come up with a fix if you show me a practical situation where the bug above can bite you. :-)
Posted by: Jouni at October 3, 2004 10:27 PMIt doesn't handle sentences starting with a UNIX command name (first letter must be in lowercase)!
:-P
"Unix and proper case" is an oxymoron anyway. :-)
Posted by: Jouni at October 4, 2004 08:37 AMVery useful ... thanks
Posted by: Rahul Guha at October 13, 2004 01:36 AM'"THE MATRIX"' becomes '"the Matrix"' instead of '"The Matrix"' as one would expect.
Apart from that, wonderfull work!
Posted by: David at October 17, 2004 07:41 AMGreat work, but the bug you mentioned above causes the function to produce Ain'T, Don'T, Devil'S, etc. :)
Posted by: Name at January 1, 2005 02:57 AMGood point. I never had a test case for that functionality. Suppose I need to fix that one soonish...
Posted by: Jouni at January 1, 2005 08:31 AMTry changing the regex expression from \w to [\w\']+
That seemed to work for me.
Posted by: Jay Turpin at January 29, 2005 04:32 PMcan someone handle obrien??
Posted by: chris at September 8, 2005 08:40 PMNON
Posted by: babak arbtan at October 10, 2005 10:59 AMLooks good to me thanks for the code!
Thank you - very well written.
Posted by: at March 21, 2006 02:54 AMThanks you so much! Well Written Codes.
Posted by: Leonard Lee at July 7, 2006 03:41 AM"U.S.A." doesn't get handled properly - it turns into "U.S.a." Any ideas how to tweak this?
Posted by: Eliezer at April 12, 2007 02:07 AMError when using possessive used. Like that of "Johnson's"
Posted by: afterburn at July 10, 2007 09:29 PMCould this make your algorithm more efficient?
http://yyyz.net/CSharpCode/ProperCase.aspx
Currencies in financial markets USA dollar
http://cinige.disi.unige.it/elearning/moodle/user/view.php?id=29&course=1#usa-dollar
[URL=http://cinige.disi.unige.it/elearning/moodle/user/view.php?id=29&course=1#usa-dollar]USA dollar[/URL]
Thanks a lot, keep up the good work with the codes.
Posted by: cheap sat nav at April 4, 2008 04:06 AMOur random archives: Sexy free lolita bbs shows her bush. russian lolita bbs redheaded woman. magic lolita has a very hairy muff. lolita bbs has rastas on her head.
Posted by: free lolita bbs at May 7, 2008 01:51 PMRecently it was reported that one in every four homeless people in America was a free Lolitas bbs ! Although our society wants you to think that Lolitas ( freedom Lolitas bbs ) are people that should be normally out casted from society because of their association with the military and because of their political motive to propagate the idea that these free Lolitas bbs are strange and odd and don't fit into society. This is nothing more than a pathetic agenda to dictate our nations social order through a mutual association! Dont you think so?!
Posted by: freedom Lolitas bbs at May 9, 2008 12:59 AM