Help needed with regex and html [SOLVED]

Programming languages, Coding, executables, and scripting.
Post Reply
Message
Author
MultiplexLayout
Posts: 17
Joined: 2020-09-23 19:21

Help needed with regex and html [SOLVED]

#1 Post by MultiplexLayout »

I am trying to modify an epub file that uses image files instead of accented letters. I am replacing the images with unicode characters. In order to do this I need to perform a find and replace on a regex matching the image tag, from the opening angle bracket to the closing one. However, consider the following:

Code: Select all

<img alt="image" src="c0011-01.jpg"/>s</strong> because the <strong>mi</strong> is short, but <strong>a-m<img alt="image" src="c0007-01.jpg"/>-tus</strong> because the <strong>m<img alt="image" src="c0025-01.jpg"/>
I want to target the image tag containing c0007-01.jpg but it is flanked by two other image tags. Any regex I have tried targets from the first image tag (c0011-01.jpg) to the third c0025.jpg). I need a regex that:
  • starts at the "<" and ends at the ">" (so I can execute a find and replace cleanly)
    must contain c0007-01.jpg
    does not contain any additional "<" within
If I have a regex that fulfills the above criteria, I'm fairly sure that it will only target the tag I want. Any help would be greatly appreciated.
Last edited by MultiplexLayout on 2021-05-10 14:39, edited 1 time in total.

User avatar
dilberts_left_nut
Posts: 5129
Joined: 2009-10-05 07:54
Location: enzed
Has thanked: 1 time
Been thanked: 1 time

Re: Help needed with regex and html

#2 Post by dilberts_left_nut »

You need a 'non greedy' match.
Probably include a 'NOT <' term.
I can't spit one out ATM,, but that might help your search.
AdrianTM wrote:There's no hacker in my grandma...

MultiplexLayout
Posts: 17
Joined: 2020-09-23 19:21

Re: Help needed with regex and html

#3 Post by MultiplexLayout »

dilberts_left_nut wrote:You need a 'non greedy' match.
Probably include a 'NOT <' term.
I can't spit one out ATM,, but that might help your search.
This gave me the insight I needed. Thank you. For anyone stumbling on this thread the following regex solved my problem:

Code: Select all

<img[^\/]*?c0007-01.jpg[^\/]*?\/>

Post Reply