Sometimes , When we try to store strings in the database , It will HTML Store with tags . however , Some web sites need to render strings in their original format , Without any... In the database HTML Mark . therefore , In this tutorial , We will learn how to be in Python Delete from the string HTML Different ways of marking .
Regular expressions are combinations of characters that represent search patterns . stay python Regular expression module of , We used sub() function , It replaces the string that matches the specified pattern with another string . The use of regular expressions to remove... From a string is mentioned below HTML String code .
import re
regex = re.compile(r'<[^>]+>')
def remove_html(string):
return regex.sub('', string)
text=input("Enter String:")
new_text=remove_html(text)
print(f"Text without html tags: {new_text}")
Output 1:
Enter String:<div class="header"> Welcome to my website </div> Text without html tags: Welcome to my website
Output 2:
Enter String:<h1> Hello </h1> Text without html tags: Hello
Here's how to remove from a string without using a built-in function HTML String code .
def remove_html(string):
tags = False
quote = False
output = ""
for ch in string:
if ch == '<' and not quote:
tag = True
elif ch == '>' and not quote:
tag = False
elif (ch == '"' or ch == "'") and tag:
quote = not quote
elif not tag:
output = output + ch
return output
text=input("Enter String:")
new_text=remove_html(text)
print(f"Text without html tags: {new_text}")
Output:
Enter String:<div class="header"> Welcome to my website </div> Text without html tags: Welcome to my website
How the above code works ?
In the code above , We keep two counters , be called tag and quote. tag Variable tracking label , and quote Variables track single and double quotation marks in the input string . We use for Loop through each character of the string . If the character is a start or end marker , be Tag The variable is set to False. If the character is a single or double quotation mark , The quotation mark variable is set to False. otherwise , This character will be appended to the output string . therefore , In the output of the above code , Deleted div label , Only the original string is left .
It is mentioned below that XML Module, delete from the string HTML String code . XML It's a markup language , Used to store and transmit large amounts of data or information . Python There are some built-in modules that can help us parse XML file .XML Documents have separate units , It's called the element , Mark at the beginning and end (<>) define . Anything between the start tag and the end tag is the content of the element . An element can consist of multiple child elements called child elements . Use Python Medium ElementTree modular , We can operate these easily XML file .
import xml.etree.ElementTree
def remove_html(string):
return ''.join(xml.etree.ElementTree.fromstring(string).itertext())
text=input("Enter String:")
new_text=remove_html(text)
print(f"Text without html tags: {new_text}")
Output:
Enter String:<p class="intro"> I love Coding </p> Text without html tags: I love Coding
How the above code works ?
first , We are Python Import xml.etree.ElementTree modular
We use formstring() Method to convert or parse a string to XML Elements . In order to traverse the formstring() Function returns each XML Elements , We used itertext() function . It basically iterates through each XML Element and returns the inner text within that element .
We use join The function concatenates the inner text with an empty string , And return the final output string .
Last , We call remove_html Function to delete... From the input string HTML label .
therefore , About how to be in Python Delete from the string HTML This concludes the tutorial on tagging . You can use the following links to learn about Python More information about regular expressions in .