User:Riblet15/Regex

From Old School RuneScape Wiki
Jump to: navigation, search

Regular expressions are a tool you can use to search through text using patterns. Think of it like find/replace in notepad with some special symbols thrown in. This guide is meant to be a basic introduction for using regular expressions in the Minimal OSRS Items Database (MOID). I'll try to show an example of when you might want to use each expression when searching for items in MOID. I encourage you to try running each of these sample patterns and modify them to understand what groups of items will be found.

This tutorial only covers the most useful parts of regex in the context of searching MOID. For a more thorough guide, there are many resources available online such as regular-expressions.info and wikipedia.

Syntax[edit | edit source]

Regular expression patterns will be shown in /slashes/. This is how MOID recognizes that you want to search by pattern. If you leave out the slashes, it will assume you want to find exact matches only. All searches in MOID are case-insensitive, meaining /foo/ and /Foo/ will show the same results.

Basic matching[edit | edit source]

Literal characters[edit | edit source]

/foo/

A basic search will look for the literal characters you have typed as a substring anywhere in the name of an item. This pattern looks for all the items containing foo, like Fish food and Rabbit foot.

Escape sequences[edit | edit source]

/p\+/

The issue with literal searches is that a lot of characters are special. For all the special characters that come up in this guide like + and (, you can use a backslash character to escape them. Since the + character has a special purpose, this pattern needs to use the escape sequence \+ to search for everything containing p+.

Wildcards[edit | edit source]

/a.g/

The special character . can be used as a wildcard to match any character. This pattern matches dagger, strange, and even Santa gloves, since a space also counts as a character.

Character sets[edit | edit source]

/[zm]ogr/

Square brackets can be used as an OR statement for a list of characters. Any single character can be chosen from within the square brackets. This pattern will match mogre and zogre since either z or m can be used in the match.

/fis[^ht]/

Character sets can be negated by putting the ^ character first. A negated character class matches any single character that is not within the square brackets. This pattern means a character must appear after fis that is not h or t. The query here will match fiscal but not swordfish.

/m[0-4]/

The - symbol inside of square brackets has a special purpose that creates ranges of characters. This pattern has a character set that matches any digit between 0-4. This pattern will match (m1), (m2), etc. Ranges of letters can also be used like /[s-w]/ to match any letter between s and w.

/page \d/

There are more shortcuts for certain character sets. The most useful are \d for digits, \s for whitespace, and \w for word characters (letters, numbers, and underscore). This pattern matches page 1 and page 2, etc.

Or operator[edit | edit source]

/dag(ger|anno)/

The pipe character can also be used for OR statements. The main difference here is that you can use longer strings in the parentheses. This pattern will match dagger and also match dagannoth. You are also able to have more than two options separated by pipes.

Quantifiers[edit | edit source]

Quantifiers give you a way to match repeated patterns. This is where expressions start to get really fun (and confusing).

Zero or more[edit | edit source]

/ro*d/

The asterisk means that the previous character must appear zero or more times. In this example the letter o can appear any number of times, so we will match sword, Fishing rod, and Broodoo shield.

One or more[edit | edit source]

/ro+d/

The plus sign means the previous character must appear one or more times. Unlike the previous example, we won't match sword because there has to be at least one o in the middle of r and d.

Optional / Zero or one[edit | edit source]

/gu?ard/

The question mark means that the previous character must appear zero or one time. In this example the u is optional, so we will match both guard and garden.

Specific numbers[edit | edit source]

/go{2}d/

A number can be put in curly brackets for more control. A single number in curly brackets means that the previous character must occur exactly that many times. In this example we will match Good anthem, since the patern is the same as typing /good/.

/g.{2,4}d/

By using two numbers in the curly braces, the pattern will match the previous character any number of times between those bounds inclusively. Since in this example the character before the curly brackets is a wildcard, any character is allowed to be matched. For this pattern we will match g, followed by between 2 and 4 of any characters, followed by a d. Therefore we match Empty gourd vial (with 3 characters between g and d) as well as longsword (with 4 characters between g and d), etc.

/g.{15,}d/

By using only one number with a comma in the curly braces, the count is bounded only on the lower side. In this search we will look for the previous character 15 or more times. This ends up matching the item Orange rainbow strand.

Anchors[edit | edit source]

Start of string[edit | edit source]

/^x/

The caret symbol can be used to refer to the start of the string. Since nothing can come before the start of the string, use of the caret has to come first in the expression. This example matches all the items that start with the letter x.

End of string[edit | edit source]

/az$/

The dollar sign refers to the end of the string. This one has to be at the end of the pattern because nothing can come after the end of the string. For this example we will match Red topaz but we will not match Maze key.

Start and end of string[edit | edit source]

/^raw.*pie$/

The anchors are commonly used together to define the entire item name. This fancy combo move says the item has to start with the string raw, followed by zero or more of any character, and then end with the string pie. As you might expect, this matches Raw fish pie among other pies.

Capturing groups[edit | edit source]

Sometimes you need parts of the pattern to be grouped together. You can use capturing groups to repeat parts of the pattern easily.

Parentheses[edit | edit source]

/(foo)/

A capturing group is defined by putting parentheses around parts of the pattern. On its own, a capturing group doesn't do anything special. This example here is the same as just searching /foo/.

/b(an)+/

A capturing group gets interesting when you use quantifiers. In this case, the + symbol references the entire group within parentheses. This means the entire group (an) has to be matched one or more times, such as an, anan, ananan, etc. This means we will match Red headband, as well as Banana.

/(a.){3}/

You can combine features like capturing groups, wildcards, and quantifiers. Since this capturing group has curly braces after it, the group gets copied three times like /a.a.a./. The wildcard can refer to a different character each time, so we end up matching Adamantite ore as well as Papaya fruit. Note that in Papaya fruit, the space counts as a character when matching the . symbol.

Backreferences[edit | edit source]

/(b..)\1/

Backreferences are written as a backslash followed by a number. The \1 means to match the exact content of the first capturing group again. In this example, there is exactly one capturing group (b..), which will match b followed by any two characters. This group of 3 characters then has to occur in exactly the same way again. This pattern will match Barbarian rod, since the group bar appears exactly the same twice. Note that this is unlike the pattern /b..b../. We will not match Star bauble, because using a backreference forces the two occurences to be exactly the same.

/(.)(.)\2\1/

If you have multiple capturing groups, they will be numbered off starting at 1. You can reference each specific group by its number. In this pattern we have two separate capturing groups that are followed by two backreferences. The two characters matched in the groups must then appear in the opposite order, such as Cabbage and Suqah tooth.


Common patterns[edit | edit source]

/.*/

This is a common pattern that matches any number of any characters. This often goes between two things if you require them both to appear anywhere in the string.

/[a-zA-Z]/

This set refers to any letter, upper or lowercase. Recall that in MOID the search doesn't care about letter casing, but this is very common in other places that use regex.

/Greenman'?s/

Item names aren't always consistent, so you may need to include some items as optional. The apostrophe in greenman's is sometimes there and sometimes not. The same should be considered when searching for space characters.

/[^ ]/

It may be useful to match everything except a space. This character class lets you quickly match letters, numbers, and other symbols besides space.

/^...$/

Using the start and end anchors together defines the entire string. This is useful since otherwise the pattern will match substrings anywhere within the item name. This example matches items that are exactly 3 characters long.

Quick reference table[edit | edit source]

Syntax Description
* match the previous character zero or more times
+ match the previous character one or more times
? match the previous character zero or one time
{2,4} match the previous character 2 to 4 times
. match any single character
[a-z] match any character from within the set
(a|b) match a or b
^ match the start of the string
$ match the end of the string
(abc) create a capturing group for abc
\1 backreference the first capturing group

Don't forget to escape these characters with a backslash if you want to use them literally: *+?{}.[]|^$()\

Challenges[edit | edit source]

Now that you're an expert at the basics of regex, are you up to the challenge? The goal is to use MOID and write a pattern that will find the items that match the descriptions. The challenges are listed approximately in order of difficulty.

Challenge Solution Spoiler
Example This item contains the sequence "tne". /tne/ spoiler
1 This item contains the sequence "s(u". spoiler
2 This item starts with "f" and ends with "ff". spoiler
3 This item contains only vowels (no symbols). spoiler
4 These two 4-letter items are made up only of letters in the phrase "cook me plox", allowing repeats. spoiler
5 This item has the longest name that is made up only of letters in the phrase "spineweilder". spoiler
6 This item starts with either "we" or "wi" and ends with either "op" or "ws". spoiler
7 The count of apostrophe and period characters in the name of this item is 4. spoiler
8 This item starts and ends with the letter "t", and contains an apostrophe. spoiler
9 This item starts and ends with the same character, and contains an &. spoiler
10 This item ends with an s, and the first 6 characters appear in the same order twice. spoiler
11 This item has the longest single word made only of letters. spoiler
12 This item starts with the letter C and contains 3 double letters. easy spoiler harder spoiler
13 This item has a non-ascii character. spoiler1 spoiler2
14 Each of the three words in this item begin with the same two-character sequence. (Bonus: the regex should support any number of words 3 or higher) spoiler
15 This item has more than 1 word, and all the words start and end with the exact same character. spoiler
16 For the three words in this item, the last letter of each word is the same as the first letter of the next word. (Bonus: the regex should support any number of words 3 or higher) easy spoiler harder spoiler

Epilogue[edit | edit source]

Let me know if you think anything is too confusing or you have suggestions for other useful features. If you can come up with other good challenge questions feel free to add them to the list. Riblet15 (talk) 23:58, 18 November 2018 (UTC)