程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python data extraction BS4

編輯:Python

Use bs4 Extract local html When you file , An encoding error occurred . as follows

#-*- coding = utf-8 -*-
#@Time : 2022/2/20 17:46
#@File : bs4 Data analysis .py
#@software : PyCharm
#bs4 Data analysis
# Principle of data analysis 1. Label positioning ,2. Extract tags , Data values stored in label properties
#bs4 1. Label positioning 1. Instantiate a BeautifulSoup object , And load the page source code into the object
#2. By calling BeautifulSoup Object for tag location and data extraction
# Environmental installation :install bs4 pip install lxml
from bs4 import BeautifulSoup
# Object instantiation
#1. Local HTML You can only get the text content directly below the tag
# Will local html Load with this object
fp =open('./sogou.html','r',encoding='utf-8')
soup = BeautifulSoup(fp,'lxml')
fp.close()
print(soup)
#2. Load the source code of the page obtained from the Internet into the object 

An error has occurred UnicodeEncodeError: 'gbk' codec can't encode character '\xa0' in position 7819: illegal multibyte 

terms of settlement :

import sys
import io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8') # Change the default encoding of standard output
fp =open('./sogou.html','r',encoding='utf-8')
soup = BeautifulSoup(fp,'lxml')
fp.close()
print(soup.decode('utf-8'))

  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved