程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Share a python visualization module with word-of-mouth explosion, simple and fast!!

編輯:Python

Today, Xiaobian comes to chat with you Python In the middle of altair Visualization module , And draw some common charts by calling this module , With the help of Altair, We can put more energy and time on understanding the data itself and the meaning of the data , Free from the complex data visualization process .

Altair What is it ?

Altair It is called statistical visualization Library , Because it can be summarized by classification 、 Data transformation 、 Data interaction 、 Comprehensively understand the data by means of graphic composition 、 Understand and analyze data , And the installation process is also very simple , Directly through pip Command to execute , as follows

pip install altair
pip install vega_datasets
pip install altair_viewer

If you are using conda Package manager to install Altair Module , The code is as follows

conda install -c conda-forge altair vega_datasets

Altair First experience

Let's simply try to draw a histogram , First create a DataFrame Data sets , The code is as follows

df = pd.DataFrame({"brand":["iPhone","Xiaomi","HuaWei","Vivo"],
                   "profit(B)":[200,55,88,60]})

Next is the code for drawing histogram

import altair as alt
import pandas as pd
import altair_viewer
chart = alt.Chart(df).mark_bar().encode(x="brand:N",y="profit(B):Q")
#  Display data , call display() Method
altair_viewer.display(chart,inline=True)

output

From the whole grammatical structure , use first alt.Chart() Specify the dataset to use , Then use the instance method mark_*() Style of drawing chart , Last specified X Axis and Y The data represented by the axis , You may be curious , In the middle of N as well as Q What do they represent , This is the abbreviation of variable type , let me put it another way ,Altair The module needs to know the types of variables involved in drawing graphics , That's the only way , The drawing is the effect we expect .

Among them N It represents nominal variables (Nominal), For example, the brand of mobile phones is a proper noun , and Q It represents a numerical variable (Quantitative), It can be divided into discrete data (discrete) And continuous data (continuous), In addition, there are time series data , The abbreviation is T And sequential variables (O), For example, in the process of online shopping, the ratings of merchants are 1-5 Five stars .

Saving of charts

Save the last chart , We can call save() Methods to save , Save the object as HTML file , The code is as follows

chart.save("chart.html")

You can save it as well JSON file , From the code point of view, it is very similar

chart.save("chart.json")

Of course, we can also save files in image format , As shown in the figure below

Altair Advanced operation of

We are based on the above , Further derivation and expansion , For example, we want to draw a horizontal bar graph ,X Axis and Y Axis data exchange , The code is as follows

chart = alt.Chart(df).mark_bar().encode(x="profit(B):Q", y="brand:N")
chart.save("chart1.html")

output

At the same time, we also try to draw a line chart , It's called mark_line() The method code is as follows

##  Create a new set of data , Take the date as the row index value
np.random.seed(29)
value = np.random.randn(365)
data = np.cumsum(value)
date = pd.date_range(start="20220101", end="20221231")
df = pd.DataFrame({"num": data}, index=date)
line_chart = alt.Chart(df.reset_index()).mark_line().encode(x="index:T", y="num:Q")
line_chart.save("chart2.html")

output

We can also draw a Gantt chart , Usually used in project management ,X The axis adds the time and date , and Y The axis shows the progress of the project , The code is as follows

project = [{"project": "Proj1", "start_time": "2022-01-16", "end_time": "2022-03-20"},
 {"project": "Proj2", "start_time": "2022-04-12", "end_time": "2022-11-20"},
 ......
 ]
df = alt.Data(values=project)
chart = alt.Chart(df).mark_bar().encode(
    alt.X("start_time:T",
          axis=alt.Axis(format="%x",
                        formatType="time",
                        tickCount=3),
          scale=alt.Scale(domain=[alt.DateTime(year=2022, month=1, date=1),
                                  alt.DateTime(year=2022, month=12, date=1)])),
    alt.X2("end_time:T"),
    alt.Y("project:N", axis=alt.Axis(labelAlign="left",
                                     labelFontSize=15,
                                     labelOffset=0,
                                     labelPadding=50)),
    color=alt.Color("project:N", legend=alt.Legend(labelFontSize=12,
                                                   symbolOpacity=0.7,
                                                   titleFontSize=15)))
chart.save("chart_gantt.html")

output

From the above figure, we can see several projects being done by the team , The degree of progress of each project is different , Yes, of course , The time span of different projects is also different , If it is shown on the chart, it will be very intuitive .

Then , Let's draw a scatter chart , It's called mark_circle() Method , The code is as follows

df = data.cars()
##  The selected area is “USA” That is, the passenger car data of the United States
df_1 = alt.Chart(df).transform_filter(
    alt.datum.Origin == "USA"
)
df = data.cars()
df_1 = alt.Chart(df).transform_filter(
    alt.datum.Origin == "USA"
)
chart = df_1.mark_circle().encode(
    alt.X("Horsepower:Q"),
    alt.Y("Miles_per_Gallon:Q")
)
chart.save("chart_dots.html")

output

Of course, we can further optimize it , Make the chart look more beautiful , Add some colors , The code is as follows

chart = df_1.mark_circle(color=alt.RadialGradient("radial",[alt.GradientStop("white", 0.0),
                                                 alt.GradientStop("red", 1.0)]),
              size=160).encode(
    alt.X("Horsepower:Q", scale=alt.Scale(zero=False,padding=20)),
    alt.Y("Miles_per_Gallon:Q", scale=alt.Scale(zero=False,padding=20))
)

output

We change the size of the scatter , The size of different scatter points represents different values , The code is as follows

chart = df_1.mark_circle(color=alt.RadialGradient("radial",[alt.GradientStop("white", 0.0),
                                                 alt.GradientStop("red", 1.0)]),
              size=160).encode(
    alt.X("Horsepower:Q", scale=alt.Scale(zero=False, padding=20)),
    alt.Y("Miles_per_Gallon:Q", scale=alt.Scale(zero=False, padding=20)),
    size="Acceleration:Q"
)

output

NO.1

Previous recommendation

Historical articles

Share 5 Commonly used feature selection methods , Introduction to machine learning !!!

【 Hard core original 】 Inventory Python Common encryption algorithms in crawlers , Recommended collection !!

use Python among Plotly.Express The module draws several charts , I was really amazed !!

【 Hard core dry goods 】Pandas Data type conversion in modules

Share 、 Collection 、 give the thumbs-up 、 I'm looking at the arrangement ?


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved