HTML in python
Processing HTML Files: -
Text Data:
- Text data is one of the most common forms of unstructured data. It
includes:
- Web pages (HTML)
- Articles, books, essays
- Social media posts
- Emails, chat logs
- Research papers
HTML: -
- HTML stands for Hyper Text Markup Language.
- It is a standard language which is used to design static web pages using a markup language.
- HTML is the combination of Hypertext and Markup language.
- Hypertext defines the link between the web pages.
- A markup language is used to define the text document within tag which defines the structure of web pages.
- Most of the markup languages (e.g. HTML) are human-readable.
- This language uses tags to define what manipulation has to be done on the text.
BeautifulSoup
module: -
- The BeautifulSoup module is used for parsing, accessing and modifying HTML.
- It creates a parse tree for parsed web pages based on a specific criterion that can be used to extract, navigate, search and modify the data from HTML.
- Install BeautifulSoup4 module by using the following command:
py
-m pip install BeautifulSoup4
Creating HTML
file named demo.html: -
output: -
To extract
the content from HTML tags we use following python code:
Example :
output: -