Use bs4 Extract local html When you file , An encoding error occurred . as follows
#-*- coding = utf-8 -*- #@Time : 2022/2/20 17:46 #@File : bs4 Data analysis .py #@software : PyCharm #bs4 Data analysis # Principle of data analysis 1. Label positioning ,2. Extract tags , Data values stored in label properties #bs4 1. Label positioning 1. Instantiate a BeautifulSoup object , And load the page source code into the object #2. By calling BeautifulSoup Object for tag location and data extraction # Environmental installation :install bs4 pip install lxml from bs4 import BeautifulSoup # Object instantiation #1. Local HTML You can only get the text content directly below the tag # Will local html Load with this object fp =open('./sogou.html','r',encoding='utf-8') soup = BeautifulSoup(fp,'lxml') fp.close() print(soup) #2. Load the source code of the page obtained from the Internet into the object
An error has occurred UnicodeEncodeError: 'gbk' codec can't encode character '\xa0' in position 7819: illegal multibyte
terms of settlement :
import sys import io sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8') # Change the default encoding of standard output fp =open('./sogou.html','r',encoding='utf-8') soup = BeautifulSoup(fp,'lxml') fp.close() print(soup.decode('utf-8'))