HTML in python

Processing HTML Files: -

    Text Data: - Text data is one of the most common forms of unstructured data. It includes:

  • Web pages (HTML)
  • Articles, books, essays
  • Social media posts
  • Emails, chat logs
  • Research papers

         To analyse or extract useful information from text, we use techniques from text processing             and Natural Language Processing (NLP).

 

HTML: -

  • HTML stands for Hyper Text Markup Language.
  • It is a standard language which is used to design static web pages using a markup language.
  • HTML is the combination of Hypertext and Markup language.
  • Hypertext defines the link between the web pages.
  • A markup language is used to define the text document within tag which defines the structure of web pages.
  • Most of the markup languages (e.g. HTML) are human-readable.
  • This language uses tags to define what manipulation has to be done on the text. 


BeautifulSoup module: -

  • The BeautifulSoup module is used for parsing, accessing and modifying HTML.
  • It creates a parse tree for parsed web pages based on a specific criterion that can be used to extract, navigate, search and modify the data from HTML.
  • Install BeautifulSoup4 module by using the following command:

 

                                py -m pip install BeautifulSoup4

 

Creating HTML file named demo.html: -



                    output: -




To extract the content from HTML tags we use following python code:

Example :


        output: -



Popular posts from this blog

operators in c programming

Variables in c

Cloud Storage and Local Storage: Applications in Business