程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python artifact! Automatic recognition of provinces and cities in text and drawing

編輯:Python

Doing it NLP( natural language processing ) Related tasks , We often encounter the need to identify and extract provinces 、 City 、 The needs of the administrative region . Although we can search the keyword table one by one to achieve the purpose of extraction , But we need to collect the keyword list of provinces and cities first , Relatively cumbersome .

Today I will introduce a module to you , You just need to pass the string to this module , He can return the province in this string to you 、 City 、 Zone keywords , And can mark it on the picture for you , It is Cpca modular .

1. Get ready

Before the start , You have to make sure that Python and pip Has been successfully installed on the computer , without , You can visit this article : Hyperdetail Python Installation guide   Installation .

( Optional 1)  If you use Python The goal is data analysis , It can be installed directly Anaconda:Python Data analysis and mining good helper —Anaconda, It has... Built in Python and pip.

( Optional 2)  Besides , Recommended VSCode Editor , It has many advantages :Python The best partner in programming —VSCode Detailed guidelines .

Please choose one of the following ways to enter the command to install the dependency
1. Windows Environmental Science open Cmd ( Start - function -CMD).
2. MacOS Environmental Science open Terminal (command+ Space input Terminal).
3. If you're using a VSCode Editor or Pycharm, You can directly use the Terminal.

pip install cpca

Be careful , at present cpca The module only supports Python3 And above .

stay windows The following problems may occur on the :

Building wheel for pyahocorasick (setup.py) ... error

First read the original text to download Microsoft Visual C++ Build Tools install VC++ Building tools , Again pip install cpca, Problem solvable .

2. Basic use

Through two lines of code, you can achieve the most basic provincial and urban extraction :

# official account : Python Practical treasure
# 2022/06/23
import cpca
location_str = [
    " Shennan Middle Road, bating street, Futian District, Shenzhen City, Guangdong Province 1025 New town building No 1 layer ",
    " Tesla Shanghai Super factory is Tesla's first super factory outside the United States , Located in Shanghai, the people's Republic of China .",
    " Sanxingdui site is located on the Bank of Yazi River in Sanxingdui Town, west of Guanghan City, Sichuan Province, China , It is a bronze age cultural site "
]
df = cpca.transform(location_str)
print(df)

The effect is as follows :

 province City District Address adcode
0 Guangdong province, shenzhen Futian district Shennan Middle Road, bating street 1025 New town building No 1 layer 440304
1 Shanghai None None .310000
2 Sichuan Province deyang Guanghan City By the Duck River in Sanxingdui town in the west of the city , It is a bronze age cultural site 510681

Pay attention to Article 3 of Guanghan City ,cpca Not only the county-level city Guanghan City in the sentence is recognized , It can also be automatically matched to Deyang City, which is the entrusted city , I have to say it's very powerful .

If you want to know that the program extracts the name of the province or city from the position of the string , You can add one  pos_sensitive=True  Parameters :

# official account : Python Practical treasure
# 2022/06/23
import cpca
location_str = [
    " Shennan Middle Road, bating street, Futian District, Shenzhen City, Guangdong Province 1025 New town building No 1 layer ",
    " Tesla Shanghai Super factory is Tesla's first super factory outside the United States , Located in Shanghai, the people's Republic of China .",
    " Sanxingdui site is located on the Bank of Yazi River in Sanxingdui Town, west of Guanghan City, Sichuan Province, China , It is a bronze age cultural site "
]
df = cpca.transform(location_str, pos_sensitive=True)
print(df)

The effect is as follows :

(base) G:\push\20220623>python 1.py
      province City District Address adcode province _pos City _pos District _pos
0   Guangdong province, shenzhen Futian district Shennan Middle Road, bating street 1025 New town building No 1 layer 440304      0      3      6
1   Shanghai None None .310000     38     -1     -1
2   Sichuan Province deyang Guanghan City By the Duck River in Sanxingdui town in the west of the city , It is a bronze age cultural site 510681      9     -1     12

It marks the identification to the province 、 City 、 Key location of the zone (index), Of course, if it is Deyang City, this special identification will be marked as -1.

3. Advanced use

It can also batch identify multiple regions from large pieces of text :

# official account : Python Practical treasure
# 2022/06/23
import cpca
long_text = " The evaluation of a city always includes personal feelings . If you like a city , It is likely that I like myself at that time and place ."\
    " In Guangzhou 、 I have read in Hong Kong , Worked , Bought a house in Shenzhen 、 A short life , I went on several business trips to Beijing ."\
    " I would like to focus on Guangzhou 、 Shenzhen and Hong Kong , By the way, Beijing . in general , I feel comfortable in Guangzhou 、"\
    " Hong Kong exquisite 、 Shenzhen is young and has a good atmosphere 、 Beijing has a rough atmosphere . Answer: the Lord has chosen Guangzhou ."
df = cpca.transform_text_with_addrs(long_text, pos_sensitive=True)
print(df)

The effect is as follows :

(base) G:\push\20220623>python 1.py
           province City District Address adcode province _pos City _pos District _pos
0        Guangdong province, guangzhou None     440100     -1     44     -1
1    Hong Kong Special Administrative Region None  None     810000     47     -1     -1
2        Guangdong province, shenzhen None     440300     -1     58     -1
3        The Beijing municipal None  None     110000     71     -1     -1
4        Guangdong province, guangzhou None     440100     -1     86     -1
5        Guangdong province, shenzhen None     440300     -1     89     -1
6    Hong Kong Special Administrative Region None  None     810000     92     -1     -1
7        The Beijing municipal None  None     110000    100     -1     -1
8        Guangdong province, guangzhou None     440100     -1    110     -1
9    Hong Kong Special Administrative Region None  None     810000    115     -1     -1
10       Guangdong province, shenzhen None     440300     -1    120     -1
11       The Beijing municipal None  None     110000    128     -1     -1
12       Guangdong province, guangzhou None     440100     -1    143     -1

More Than This , The module also comes with some simple drawing tools , The data output above can be drawn in the form of thermal diagram on the map :

# official account : Python Practical treasure
# 2022/06/23
import cpca
from cpca import drawer
long_text = " The evaluation of a city always includes personal feelings . If you like a city , It is likely that I like myself at that time and place ."\
    " In Guangzhou 、 I have read in Hong Kong , Worked , Bought a house in Shenzhen 、 A short life , I went on several business trips to Beijing ."\
    " I would like to focus on Guangzhou 、 Shenzhen and Hong Kong , By the way, Beijing . in general , I feel comfortable in Guangzhou 、"\
    " Hong Kong exquisite 、 Shenzhen is young and has a good atmosphere 、 Beijing has a rough atmosphere . Answer: the Lord has chosen Guangzhou ."
df = cpca.transform_text_with_addrs(long_text, pos_sensitive=True)
drawer.draw_locations(df[cpca._ADCODE], "df.html")

This error may be reported when running :

(base) G:\push\20220623>python 1.py
Traceback (most recent call last):
  File "1.py", line 12, in <module>
    drawer.draw_locations(df[cpca._ADCODE], "df.html")
  File "G:\Anaconda3\lib\site-packages\cpca\drawer.py", line 41, in draw_locations
    import folium
ModuleNotFoundError: No module named 'folium'

Use pip Can be installed :

pip install folium

Then rerun the code , Will generate... In the current directory df.html, Double-click to open , The effect is as follows :

How to use it? , Does it feel very convenient ? In the future, this module will be sufficient for location identification .

For more details, you can visit the Github Home page reading , The project README Written entirely in Chinese , It's very easy to read :

https://github.com/DQinYuan/chinese_province_city_area_mapper

If you can't access GitHub, It can also be in Python Practical public official account reply :cpca Download the full project .

This is the end of our article , If you like today's Python Practical course , Stay tuned Python Practical treasure .

Any questions , You can reply in official account. : Add group , Answer accordingly Red letter verification information , Enter the mutual aid group and ask .

Originality is not easy. , I hope you can praise me and support me to continue to create , thank you !

Click below to read the original text for a better reading experience

Python Practical treasure  (pythondict.com)
It's not just a treasure book
Welcome to the official account :Python Practical treasure


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved