Some definitions before we start:

  • literal: A literal is any character we use in a search or matching expression, for example, to find ind in w ind ows the ind is a literal string - each character plays a part in the search, it is literally the string we want to find.
  • metacharacter: A metacharacter is one or more special characters that have a unique meaning and are NOT used as literals in the search expression, for example, the character ^ (circumflex or caret) is a metacharacter.
  • target string: This term describes the string that we will be searching, that is, the string in which we want to find our match or search pattern.
  • search expression: Most commonly called the regular expression. This term describes the search expression that we will be using to search our target string, that is, the pattern we use to find what we want.
  • escape sequence: An escape sequence is a way of indicating that we want to use one of our metacharacters as a literal. In a regular expression an escape sequence involves placing the metacharacter \ (backslash) in front of the metacharacter that we want to use as a literal, for example, if we want to find (s) in the target string window(s) then we use the search expression \(s\) and if we want to find \\file in the target string :\\file then we would need to use the search expression \\\\file (each \ we want to search for as a literal (there are 2) is preceded by an escape sequence ).

1.Brackets, Ranges and Negation

  • [ ] : Match anything inside the square brackets for ONE character position, once and only once.
  • - : The - (dash) inside square brackets is the ‘range separator’ and allows us to define a range.
    • Note: - inside brackets(as a literal) must come first or last.
  • ^ : The ^ (circumflex or caret) inside square brackets negates the expression.
    • Note: There are no spaces between the range delimiter values.

2.Positioning (or Anchors)

  • ^ : The ^ (circumflex or caret) when not used inside square brackets means look only at the beginning of the target string.
  • $ : The $ (dollar) means look only at the end of the target string.
  • . : The . (period) means any character(s) in this position.

3.Iteration ‘metacharacters’

The following is a set of iteration metacharacters that can control the number of times the preceding character is found in our searches.

  • ? : The ? (question mark) matches when the preceding character occurs 0 or 1 times only.
  • *: The * (asterisk or star) matches when the preceding character occurs 0 or more times.
  • + : The + (plus) matches when the preceding character occurs 1 or more times.
  • {n} : Matches when the preceding character, or character range, occurs n times exactly.
  • {n,m} : Matches when the preceding character occurs at least n times but not more than m times.
  • {n, } : Matches when the preceding character occurs at least n times.

4.More ‘metacharacters’

  • () : The ( (open parenthesis) and ) (close parenthesis) may be used to group (or bind) parts of our search expression together. Officially this is called a subexpression and subexpressions may be nested to any depth.
  • |: The | (vertical bar or pipe) is called alternation in techspeak and means find the left hand OR right values.

5.Common Extensions and Abbreviations

  • \d : Match any character in the range 0 - 9.
  • \D : Match any character NOT in the range 0 - 9.
  • \s : Match any whitespace characters (space, tab etc.).
  • \S : Match any character NOT whitespace (space, tab).
  • \w : Match any character in the range 0 - 9, A - Z, a - z and punctuation.
  • \W : Match any character NOT the range 0 - 9, A - Z, a - z and punctuation.
  • \b : Word boundary. Match any character(s) at the beginning (\bxx) and/or end (xx\b) of a word.
  • \B : Not word boundary. Match any character(s) NOT at the beginning (\bxx) and/or end (xx\b) of a word.

PS: Punctuation symbols: . , " ' ? ! ; : # $ % & ( ) * + - / < > = @ [ ] \ ^ _ { } | ~

6.Some pragmatic gists for checking URL validation in Objective-C

1
2
3
4
5
6
7
8
9
10
// IP address check.
- (BOOL)isIPAddress {
NSString *result = self;
if ([self containsProtocol]) {
result = [result componentsSeparatedByString:@"//"].lastObject;
}
NSRegularExpression *regex = [[NSRegularExpression alloc] initWithPattern:@"[a-zA-Z]" options:0 error:NULL];
NSInteger matches = [regex numberOfMatchesInString:result options:0 range:NSMakeRange(0, result.length)];
return (matches <= 0);
}
1
2
3
4
5
6
7
8
9
10
// IP address validation check.
- (BOOL)isValidIPAddress {
NSString *result = self;
if ([self containsProtocol]) {
result = [result componentsSeparatedByString:@"//"].lastObject;
}
NSString *ipRegEx = @"(\\d{1,3}\\.){3}(\\d){1,3}(:\\d{1,})?";
NSPredicate *ipTest = [NSPredicate predicateWithFormat:@"SELF MATCHES %@", ipRegEx];
return [ipTest evaluateWithObject:result];
}
1
2
3
4
5
6
7
8
9
10
11
// DNS address validation check.
- (BOOL)isValidDNSAddress {
NSString *result = self;
if (![self containsProtocol]) {
result = [NSString stringWithFormat:@"http://%@",self];
}
NSString *urlRegEx =
@"(http|https)://((\\w)*|([0-9]*)|([-|_])*)+([\\.|/]((\\w)*|([0-9]*)|([-|_])*))+(:[0-9]{1,})?";
NSPredicate *urlTest = [NSPredicate predicateWithFormat:@"SELF MATCHES %@", urlRegEx];
return [urlTest evaluateWithObject:result];
}

For more details and test case, please check here.

Ref