The result string is in title case format, such as "Vuong Tru Blog", where the first letter of each word of a string is uppercase and other letters should be lowercase. I am going to give you some solutions right now and these work for Unicode as well.

Contents

Getting started

Here are some examples of corresponding inputs and outputs:

  • 'one 2 three' : 'One 2 Three'
  • 'one_2_three' : 'One_2_Three'
  • 'one-2-three' : 'One-2-Three'
  • 'one-2-thREE' : 'One-2-Three'
  • 'one-2-thREE' : 'One-2-Three'
  • 'đạo đức' : 'Đạo Đức'

I think using regular expressions is a suitable solution to resolve this problem.

Method #1: Word RegEx (ASCII Only)

Let’s suppose the input string contains only ASCII characters.

The idea of this method is as follows:

  • Prepare a regular expression which can be used to match words of the input string
  • Use the replace(regex, replaceMatchingTextFunction) method of the input string to replace matching words with newly modified words each by each
  • A newly modified word is created by uppercase the first letter and lowercase the remaining letters of the original word

I think that regular expression should be /[a-zA-Z0-9]+/g to match words that contain alphanumeric characters.

Note:

  • I do not use the /\w+/g regular expression to match words because \w also matches the underscore character _
  • The g suffix flag tells the replace() method to replace all matching words instead of the first one

Here is the code:

const convertStringToTitleCase = string => {
  return string.replace(/[a-zA-Z0-9]+/g, match => {
    const firstLetter = match.charAt(0).toUpperCase();
    const remainingLetters = match.slice(1).toLowerCase();
    return `${firstLetter}${remainingLetters}`;
  });
};

Results:

convertStringToTitleCase('one 2 three'); // 'One 2 Three'
convertStringToTitleCase('one_2_three'); // 'One_2_Three'
convertStringToTitleCase('one-2-three'); // 'One-2-Three'
convertStringToTitleCase('one-2-thREE'); // 'One-2-Three'

convertStringToTitleCase('đạo đức'); // 'đạO đứC', Oops!

Okay, it works for ASCII strings perfectly.

Method #2: First Letter RegEx (ASCII Only)

The idea of this method is quite similar to the previous by using the same replace() method of the input string, but instead of replacing words, this replaces only the first letter of each word.

To match the first letter, I use the /\b\w/g regular expression, where \b matches a word boundary which is not a word character.

Here is the code:

const convertStringToTitleCase = string => {
  return string.replace(/\b\w/g, match => match.toUpperCase());
};

Results:

convertStringToTitleCase('one 2 three'); // 'One 2 Three'
convertStringToTitleCase('one_2_three'); // 'One_2_Three'
convertStringToTitleCase('one-2-three'); // 'One-2-Three'

convertStringToTitleCase('one-2-thREE'); // 'One-2-ThREE', Oops!
convertStringToTitleCase('đạo đức'); // 'đạO đứC', Oops!

The output 'One-2-ThREE' is incorrect, the REE portion is uppercase. Easy to understand because I just uppercase the first letter when using the replace() method. To fix this issue, just lowercase the whole input string before using that method.

const convertStringToTitleCase = string => {
  return string.toLowerCase().replace(/\b\w/g, match => match.toUpperCase());
};

You can re-check the output right now.

Note that you can use the dot character . instead of \w in the regular expression after the word boundary matching character since the dot character matches any characters.

Method #3: First Letter RegEx (Unicode)

The two above methods work for ASCII strings perfectly but not for Unicode strings. In this section, I am going to give you the solution to deal with this issue.

I use the same method as method #2 to deal with the first letter of each word but with a small modification. I will break down the regular expression of method #2 /\b\w/g for easily tracking:

  • \b: I will use the handmade word boundary characters instead: the starting position of the string, whitespace characters, and special characters [email protected]#$%^&+\-*=_()[\]{};:'"\\|,<.>/?, so I have the first part (^|[\s`[email protected]#\$%^&+\-\*=\_()[\]{};:'"\\|,<.>/?])
  • \w: I will use the second part [^\s`[email protected]#$%^&+\-*=_()[\]{};:'"\\|,<.>/?] to match the first letter that is not any handmade word boundary characters, or you can use the dot character for simplicity instead

Here is the code:

const REGEX = /(^|[\s`[email protected]#$%^&+\-*=_()[\]{};:'"\\|,<.>/?])[^\s`[email protected]#$%^&+\-*=_()[\]{};:'"\\|,<.>/?]/g;
// const REGEX = /(^|[\s`[email protected]#$%^&+\-*=_()[\]{};:'"\\|,<.>/?])./g;

const convertStringToTitleCase = string => {
  return string.toLowerCase().replace(REGEX, match => match.toUpperCase());
};

Results:

convertStringToTitleCase('one 2 three'); // 'One 2 Three'
convertStringToTitleCase('one_2_three'); // 'One_2_Three'
convertStringToTitleCase('one-2-three'); // 'One-2-Three'
convertStringToTitleCase('one-2-thREE'); // 'One-2-Three'
convertStringToTitleCase('đạo đức'); // 'Đạo Đức'

Method #4: Word RegEx (Unicode)

Do not to talk too much, I will modify the regular expression from method #1 to work for Unicode strings.

Here is the code:

const convertStringToTitleCase = string => {
  return string.replace(/[^\s`[email protected]#$%^&+\-*=_()[\]{};:'"\\|,<.>/?]+/g, match => {
    const firstLetter = match.charAt(0).toUpperCase();
    const remainingLetters = match.slice(1).toLowerCase();
    return `${firstLetter}${remainingLetters}`;
  });
};

Results:

convertStringToTitleCase('one 2 three'); // 'One 2 Three'
convertStringToTitleCase('one_2_three'); // 'One_2_Three'
convertStringToTitleCase('one-2-three'); // 'One-2-Three'
convertStringToTitleCase('one-2-thREE'); // 'One-2-Three'
convertStringToTitleCase('đạo đức'); // 'Đạo Đức'

Conclusion

Dealing with Unicode strings takes a bit more time. The same technique, the different thinking.