I need a regex to get stuff between <li> and </li>

  • Thread starter SlurrerOfSpeech
  • Start date
In summary, the conversation discusses extracting a specific expression from a larger piece of text using regular expressions in C#.NET. The expression is <h2>Friends</h2><ul><li>Some Guy</li><li>Some Other Guy</li><li>Some Guy 2</li><li>Some W. Bush</li></ul> and the desired result is the content between the <li> tags. The use of a loop to extract the groups is suggested, but it is also possible to use regex only. Different methods, including lookahead and lookbehind, are mentioned for achieving the desired result. The language and regular expression engine being used is C#.NET and System.Text.RegularExpressions respectively
  • #1
SlurrerOfSpeech
141
11
What I'm ultimately tried to do is get the
Code:
Some Guy, Some Other Guy, Some Guy 2, Some W. Bush
from an expression like

Code:
<h2>Friends</h2><ul><li>Some Guy</li><li>Some Other Guy</li><li>Some Guy 2</li><li>Some W. Bush</li></ul>

This expression is in a much larger piece of text but is the only time an expression of this exact form is in it. I'm using

Code:
(?s)<h2>Friends</h2>.*?<ul>.*?</ul>

to get the expression and

Code:
<li>([a-zA-Z0-9. ]+)</li>

to get

Code:
<li>Some Guy</li>, <li>Some Other Guy</li>, <li>Some Guy 2</li> and <li>Some W. Bush</li>
, but I actually want what's BETWEEN the tags.
 
Technology news on Phys.org
  • #2
$1 or \1 (depending on the interpreter) will give the content of the first bracket instead of the full match, here the content inside the tags.
Alternatively, lookahead and lookbehind are an option, but more complicated and not necessary here.
 
  • #3
What language and/or regular expression engine are you using?
 
  • #4
FactChecker said:
What language and/or regular expression engine are you using?

C#.NET, System.Text.RegularExpressions
 
  • #5
Are you ok with using a loop to extract out the groups? Or should it be regex only?
 
  • #7
Hmm, I don't see how to get subpatterns there.

You can try (?<=<li>)[a-zA-Z0-9. ]+(?=</li>)
Lookahead and lookbehind
 

FAQ: I need a regex to get stuff between <li> and </li>

What is a regex?

A regex (short for regular expression) is a sequence of characters that defines a search pattern. It is commonly used in computer science and programming to match and manipulate strings of text.

Why do I need a regex to get stuff between
  • and
  • ?

    Using a regex allows you to easily find and extract information from a larger string of text. In this case, it can help you specifically target the content between

  • and
  • tags, which can be useful for tasks like data parsing or web scraping.

    How do I create a regex to get stuff between
  • and
  • ?

    To create a regex, you will need to use a combination of characters, special symbols, and modifiers to define your search pattern. In this case, you can use the "

  • .*?
  • " pattern to match any text between
  • and
  • tags.

    What tools can I use to test my regex?

    There are many online tools available that allow you to input a regex pattern and test it against a sample text. Some popular options include RegExr, Regex101, and RegexTester.

    Are there any resources that can help me learn how to use regex?

    Yes, there are many online tutorials and resources available that can help you learn how to use regex effectively. Some popular options include regular-expressions.info, RegexOne, and the official documentation for your programming language of choice.

    Similar threads

    Back
    Top