Toolsmith [Noun]

  1. a person who makes tools
  2. (computing) a person who creates utility programs

Utility Program [Noun]

  1. (computing) any of a large range of software, often included with the operating system [or framework, or application, or…], that runs specific tasks associated with the computer, [application,] or its environment

A good Toolsmith solves problems.

A good utility targets one issue, one problem, typically Edge Cases. The unknown patterns. A good way to solve most problems is to throw the input to an algorithm. Very functional. Because it deals with edge cases, one of the best tools to have in your toolbox are Regular Expressions, or RegEx.

In my travels I don't see enough usage of RegExes within development. They can be so powerful. As they say, "With power comes great responsibility." It's not easy to wield and most usage comes from very common patterns (validating email addresses for example has over 8,800 search results) that one can find on stackoverflow and are typically pasted into code.

There are a ton of tutorials and online generators/validators that deal with just RegEx. So what can I add? I've been using RegExes for over 17 years now. It all started with monthly crash sessions at lead by our CTO Keith Barrett. Thanks Keith.

So, what can I add? A lot really.

A couple basics that you need to just commit to memory:

  • () anything within parenthesises are treated as a group. Not unlike basic 4th grade math (my eldest daughter just finished that grade, and they learned the order in which to solve math problems — the PEMDAS acronym … Parentheses, Exponents, Multiplication, Division, Addition, Subtraction ("Please Excuse My Dear Aunt Sally"), but I digress…
  • ^ the caret denotes the start of a line
  • $ the dollar sign denotes the end of a line
  • (Carets before dollars, 'c' before 'd')

Ok, so let's get to our first RegEx:

Replace At The End/Start Of A Line

notice that the carot preceeds the parenthesis group while the dollar sign comes at the end
Replace: str

RegExes can be used within an utility, in processing page scrapes, or to find/replace within your IDE of choice. This particular RegEx is common in all three use cases. An utility may need to swap out info, while a page scrape may need to replace HTML tags, and find and relpace within your IDE for adding semicolons.

Ok, so (.)$ searches for a period at the end of the line. Hmm. Not quite. As you may have guessed, the period is a special character and would need to be escaped (\.)$ to search for a literal period at the end of a line.

No, in this instance, the period denotes anything, any character (alphanumberic), symbol (emoji, punctuation, etc), or non-printed entity (tabs, etc, however not line breaks by default as the first RegEx engines operated line-by-line, thus you didn't need to search for linebreaks, but all modern libraries allow for searching for linebreaks if you change a config option — but again, I digress with too much info).

Also to note, the lone period (or dot) is singular. The very last character.

That's the search. Now for the replace. You have two choices:

  1. Any literal string will replace whatever the last character was.
    search (.)$ replace _end on this string: Hello World! would result in: Hello World_end
    search (.)$ replace _end on string: Where have you been would result in: Where have you bee_end
  2. Or you can add a replace string on the end. To do this, you'd need to back reference what you found in your search with the grouped paranthesises. A back reference is denoted by an escape and a number. For example:
    Your search for (.)$ replace \1_end. This reads, technically, "replace the found character" by the searched for (.) "with that same found character" thus the back reference denoted by \1 "and _end".
    search (.)$ replace \1_end on string Hello World! would result in: Hello World!_end (the back reference in this search would be the !-point).
    search (.)$ replace _end\1 on string Where have you been results in: Where have you bee_endn
    search (.)$ replace _end\1 on string Hello World! results in: Hello World_end!

Hopefully this makes sense. There seems to always be some exception to a RegEx rule or character and this simplistic pattern is no different.

Search ^(.) replace _end on string Hello World! results in: _endello World!
Search ^(.) replace _end on string Where have you been results in: _endhere have you been
Now with back reference:
Search ^(.) replace _end\1 on string Hello World! results in: _endHello World!
Search ^(.) relpace _end\1 on string Where have you been results in: _endWhere have you been

This should serve as a great intro/primer into the powerful world of RegExes. A great tool that I use a lot outside of my IDE Sublime Text is Text Wrangler which has a great Find/Replace with RegEx. Online there are a number of testers and validators. This is one of my favorites.

Bonus: Quantifiers

Above we discussed how to search for one character via the period/dot. What about searching for the last two characters? Would it be two dots (..)$? Yes it could. Three dots would match the last three characters. That can get sloppy and confusing once many characters and ranges of characters are introduced to your search params. So, there are correct ways to notate quantities:
a* or (.*)$ will match zero or more.
a+ will match one or more literal 'a's (or in our above, (.+)$ would equal 1 or more characters, but is tough because that would actually select the entire line as this is a "greedy" search).
a? or (.?)$ will match zero or one 'a' or character. So in the dot, it would always get the last character.
a{5} will match exactly five
a{2,} will match two or more
a{1,3} will match between one & three.