HREF. This is probably one of the most searched for RegEx on Stackoverflow (link, href, anchor searches equal more than 18,000 results).
Okay, so here's the regex:
There are two regex specifics, and the rest is just plain text to be taken as written.
<a says to look for code that has a less-than sign followed immediatly by the letter a.
The next section
\s+ says to look for the space character (you have to escape the
s otherwise the expression will look for the literal
s character. The plus-sign
+ tells the expression to look for 1 or more.
Then we are back to literal characters:
To summarize thus far: Look for
<a followed by some number of spaces, followed by
The next section is a regex character group which is denoted by the square-brackets
, in this case
[^>]. Okay, this group is going to look for anything -- spaces, single or double quotes, question-marks, letters, numbers, whatever -- just NOT the greater-than sign (>) which denotes the closing of the anchor tag. The +-sign after the square-brackets says to, that's right, look for 1 or more characters within that character group.
Finally, look for a greater-than sign.
Easy enough to switch out the
href with an
src and you can be looking for an image tag.
Capitalization you wonder? I know that FrontPage97 code likes to capitalize their tags:
<A HREF="http://microserfs.com">. Well, there are two ways. First you can throw in the case-insensitivity flag at the end, which tells the regex engine to ignore case for the whole expression:
<a\s+href[^>]+> \i. Or, this works too:
<[aA]\s+href[^>]+> (but just for the leading
a which denotes the anchor tag.