HTML in python

HTML in python

- August 28, 2025

Processing HTML Files: -

Text Data: - Text data is one of the most common forms of unstructured data. It includes:

Web pages (HTML)

Articles, books, essays

Social media posts

Emails, chat logs

Research papers

To analyse or extract useful information from text, we use techniques from text processing and Natural Language Processing (NLP).

HTML: -

HTML stands for Hyper Text Markup Language.

It is a standard language which is used to design static web pages using a markup language.

HTML is the combination of Hypertext and Markup language.

Hypertext defines the link between the web pages.

A markup language is used to define the text document within tag which defines the structure of web pages.

Most of the markup languages (e.g. HTML) are human-readable.

This language uses tags to define what manipulation has to be done on the text.

BeautifulSoup module: -

The BeautifulSoup module is used for parsing, accessing and modifying HTML.

It creates a parse tree for parsed web pages based on a specific criterion that can be used to extract, navigate, search and modify the data from HTML.

Install BeautifulSoup4 module by using the following command:

py -m pip install BeautifulSoup4

Creating HTML file named demo.html: -

output: -

To extract the content from HTML tags we use following python code:

Example :

output: -