Project background
Project operation
One 、 General word cloud rendering
Two 、 Draw word cloud according to word frequency
junction language
Project backgroundAlthough now there are many ready-made tools for making word cloud pictures , However, the following problems generally exist :
Question 1 : Too many tools , See things in a blur , Uneven quality , Dyslexia of choice ;
Question two : Most word cloud tools are more or less limited , The customized space is limited ;
Question 3 : Some tools even charge .
Based on the above questions , Feel it necessary to write an article Python Draw a picture of the word cloud , Because it is too simple ! What Xiaobai can do without any programming foundation , What tools are you looking for !
OK,FINE. We don't talk nonsense , Direct practice .
Project operation One 、 General word cloud renderingTo make a cloud picture of words, you must first have words , Where do words come from , DeeDee thought for a long time but couldn't figure it out . Since I have no idea , Then take the angry houlang soft text to play , There are different opinions about houlang , Didi dare not comment .
First , Let's save Hou Lang's full text as HL.txt, Interception part , Long like this :
next , Download and import the library needed to make the word cloud , The functions of each library are annotated .
import jieba # Stuttering participle from wordcloud import WordCloud # Word cloud display library from PIL import Image # Image processing library import numpy as np # Support multidimensional array and matrix operation import matplotlib.pyplot as plt # Image gallery
then , hold HL.txt Read it out .
# Read the text with open('HL.txt','r',encoding="UTF-8") as f: file = f.read() # Read the text as an entire string ,readlines You can read by line
Then , We need to break the whole string into words ,jieba war , Nothing grows .
# Carry out word segmentation data_cut =jieba.cut(file,cut_all = False) # Precise pattern segmentation
After dividing the words, I found , What a comma 、 Semicolon 、 The full stop has also come out as a single word , That is not , We have to find a way stop they . Build stop lists , Put the words you don't like remove fall , you 're right , I don't like the way we talk about you .
stop_words = [",",".",";","、"," We "," You "] # Custom stop word list
Of course , A friend will say , You are doing this because there is little text , It is convenient to make a stop list by yourself , But if there are thousands of texts, this stop word will not be enough .OK, Let's Baidu next stop list , casual download One , Save as stopwords.txt.stopwords.txt share 1893 A common stop , Long like this :
With a stop list , We have to use Python Read it out .
stop_words = [] # Create an empty list with open("stopwords.txt", 'r', encoding='utf-8') as f: for line in f: if len(line)>0: stop_words.append(line.strip()) # Add the stop word to stop_words In the list
The stop word is ready , The next step is remove Stop words , We have got the words we need .
data_result = [i for i in data_cut if i not in stop_words] # Get the words you need
print once data_result, Long like this :
This is not acceptable. , What we need is a string of words . therefore , Need to use join The function is separated by spaces and concatenates all words into a new string .replace In this case, it means that a new line (\n) Character is replaced with null .
text = " ".join(data_result).replace("\n","") # Concatenated into a string print(text)
Let's print it out text See the effect :
The word has , You can start designing word cloud pictures , Because all the words are in Chinese , and WordCloud Chinese is not supported by default , fall ! I also have to specify the font file path , Otherwise, there will be confusion . After all, Didi came from Europe , So I found a small block letter , You can set different fonts according to your preference , There are a lot of free fonts on the Internet .
wc = WordCloud( # Set the font , If you don't specify it, there will be garbled code , This font file needs to be downloaded font_path = " The demonstration is leisurely in regular script .ttf", background_color = "black", max_words = 5000,)
After configuration , Let's create a picture and show it .
# Generate word cloud wc.generate(text)# Save word cloud wc.to_file("IMJG.jpg") # Save the picture # Exhibition plt.imshow(wc) # Process the pictures , And show its format plt.axis("off") # Turn off the axis plt.show() # Show the picture
The effect is as follows :
Here we are , You may think DeeDee is going to write the conclusion . sorry , It's not over yet. , Our goal should not be limited to this , In poetry and distance , Oh no , Is to customize your own word cloud . DeeDee is going to add a custom base map to the word cloud , Make the word cloud look more vivid . I thought for a long time. , I don't know what kind of drawing is suitable . So DeeDee opened a long useless Photoshop cc, I drew a picture that you can do better than me with beautiful pictures png.
I named this picture JG.png, And use Image Method open .
# use Image Method to open the picture images = np.array(Image.open("JG.png"))
hold images Configure to word cloud wc In the middle , Pass to parameter mask.
wc = WordCloud( # Set the font , If you don't specify it, there will be garbled code , This font file needs to be downloaded font_path = " The demonstration is leisurely in regular script .ttf", background_color = "black", max_words = 5000, mask=images)
Regenerate and save the word cloud , The effect is as follows :
ha-ha , Slightly ugly . If you are interested, you can make a base map or online download Try a base map , The base map should be as clear as possible 、 Just highlight the color as much as possible .
Some friends may ask why the word cloud at the beginning of my article is a sentence , Here are some explanations , Because reading HL.txt When I used readlines ah ~
Two 、 Draw word cloud according to word frequencyThe general word cloud system can be used in the above methods , But in real life, our needs may be more complex , There are more cases of drawing word cloud map according to word frequency . The following is J I often use a practical case , Open source code is presented .
The general idea is from Mysql Tens of thousands of transaction records are extracted from the database , use sql Statement before the transaction scale 100 The brand of select come out , Then the word cloud is made according to the transaction scale of each brand , The larger the text, the larger the transaction scale .
#-*- coding = uft-8 -*-#@Time : 2020/5/23 10:30 In the morning #@Author : I am a J Brother #@File : my_wordcloud.py# Given word frequency, make word cloud map from matplotlib import pyplot as plt # mapping , Data visualization from wordcloud import WordCloud # The word cloud from PIL import Image # The image processing import numpy as np # Matrix operations import pymysql # database import pandas as pd # Data processing # Prepare text for word cloud ( word )conn = pymysql.connect(host="localhost", user=" Yours ", passwd=" Yours ", db="test", port=3306, charset="utf8")cur = conn.cursor()sql = "select brand as name,round(sum(jine)/10000,0) as value from Sc_month4 group by name order by value desc limit 100;"df = pd.read_sql(sql, conn)print(df)name = list(df.name) # word value = df.value # Frequency of words dic = dict(zip(name, value)) # Word frequency is stored in the form of a dictionary #print(dic)cur.close()conn.close()img = Image.open("tree.png")img_arry = np.array(img)wc = WordCloud( background_color="white", mask=img_arry, max_words=1000, max_font_size=500, #font_path=" The demonstration is leisurely in regular script .ttf" #font_path=" Youziku dragon Tibetan script .ttf" font_path=" The demonstration is leisurely in regular script .ttf")wc.generate_from_frequencies(dic) # Generate word cloud by word frequency # Drawing pictures fig = plt.figure(1)plt.imshow(wc)plt.axis("off")plt.show()# Output word cloud image to file plt.savefig("JGJG.jpg",dpi=400)
The generated word cloud looks like this :
junction languageOn the whole ,Python It's easy to make a cloud picture of words , Clear code , Less code , It is very suitable for beginners to get started . Of course , To present a good word cloud effect , The premise is that your data is clean and tidy , Therefore, the knowledge of data cleaning must be mastered .
This is about using Python This is the end of the article on the easy implementation of the word cloud drawing project , More about Python Please search the previous articles of the software development network or continue to browse the following related articles. I hope you can support the software development network in the future !