程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

1.28 numpy and pandas learning

編輯:Python

1.28 numpy And pandas Study

numpy

numpy What is it?

numerical python

An open source scientific computing library

numpy advantage

  • The code is simpler ( In array , The matrix is granular )

  • More efficient performance ( Better storage efficiency and input / output performance )

  • numpy yes python The basic library of scientific data class library

Test the execution time of a function

%timeit Executed function

Numpy The core of array

array All elements in are of the same type

array Its own attributes

  • shape: Returns a tuple , Express array Dimensions
  • ndim: A number , Express array The number of dimensions of
  • size: A number , Express array The number of all data elements in the
  • dtype: data type

establish array Methods

  • from python A list of list And nested list creation array
  • Use arange,ones,zeros,empty,full,eye Equal function creation
  • Generating random numbers np.random Module creation

array Supported operations and functions

  • Support element by element addition, subtraction, multiplication and division
  • Multi dimensional array oriented index
  • Yes sum,mean Wait for the aggregate function
  • There are linear algebraic functions

python Of list establish array

import numpy as np
x = np.array([1,2,3,4,5,6,7,8])
X = np.array(
[
[1,2,3,4],
[5,6,7,8]
]
)

use arange Create a sequence of numbers

arange([start,] stop[,step,], dtype = None)
np.ones / [zeros] / [empty( Random value )] / [full( Specify the value )] (shape, dtype=None, order='C')
# for example 
np.ones(10)
np.ones((2,3))
# np Use ones_like Create an array with the same shape 
np.ones_like(x)
# full Create the specified value 
np.full(10, 666)
np.full((2,4), 666)
np.random(d0,d1,d2……,dn)
# Random number generates the random number of the specified dimension tree array
B = np.random.randn(2,5)
# Random numbers change shape 
A = np.arange(10).reshape(2,5)
# reshape Directly transform the shape into 2*5
# Binary operations operate directly element by element 

Numpy Query the array by index

  • Basic index
  • Magic index
  • Boolean index

Basic index

0 Start from left to right ,1 Start right to left

You can index by slicing

Two dimensional array index

x[0][0]
x[0,0] # The same thing as above 
x[2] # Filter the second row 
x[:-1] # Filter multiple rows 
x[:,2] # Two dimensional filtering 

numpy The modification of the slice will modify the original array

Magic index

An index using an array of integers , It's called the magic index

x = np.arange(10)
x[[3,4,7]] # Returns an array array([3,4,7])
indexs = np.array([[0, 2], [1,3 ]])
x[indexs]
# Return to one array, Index by subscript 

Two dimensional array

X[[0, 2], :] # Filter multiple columns , Lines cannot be omitted 
X[[0, 2, 3], [1, 3, 4]] # Also specify rows and columns - list , The return is [(0,1),(2,3),(3,4)] The number of positions 

Boolean index

You can filter

One dimensional array

x = np.arange(10)
x > 5
# Return to one 10 A list of elements (True or False)
x[x > 5] # Return ratio 5 A list of elements of 
x[x < 5] += 20 # Element self increment operation 

Two dimensional array

# If the two-dimensional array is filtered 
X > 5
# What is returned is a one bit array , Play the role of dimensionality reduction 

Numpy random number random

Function name explain seed([seed]) Set random seeds , In this way, the random number generated each time will be the same rand(d0,d1,……dn) Return data in [0,1) Between randn(d0,d1……dn) The returned data has a standard normal distribution ( mean value 0, variance 1)randint(low[,high,size,dtype]) Generate random integer , contain low, It doesn't contain highrandom([size]) Generate [0.0,1.0) The random number choice(a)a It's a one-dimensional array , Generate random results from it shuffle(x) An array x Arrange randomly permutation(x) Arrange randomly , Or the full arrangement of numbers normal([loc,scale,size]) According to the average loc And variance scale Generate a number of Gaussian distributions uniform([low,high,size]) stay [low,high) Generate evenly distributed numbers between

Numpy Data computing from introduction to practice

Function name explain np.sum Sum of all elements np.prod The product of all elements np.cumsum The cumulative sum of elements np.sumprod Cumulative product of elements np.min minimum value np.max Maximum np.percentile0-100 Percentiles np.quantile0-1 quantile np.median Median np.average Average ( Weighted )np.mean Average np.std Standard deviation np.var variance

Numpy in axis Parameters

axis=0 On behalf of the line , axis=1 Representative column

about sum,mean And so on

  • axis=0 Means to eliminate the line ,axis=1 Stands for column disaggregation
  • axis=0 Represents inter-bank calculation ,axis=1 Represents cross column calculation

Standardization : A = ( A − m e a n ( a , a x i s = 0 ) ) / s t d ( A , a x i s = 0 ) A=(A-mean(a,axis=0))/std(A,axis=0) A=(A−mean(a,axis=0))/std(A,axis=0)

Numpy Calculation of the elements satisfying the condition in

import numpy as np
arr = np.random.randint(1, 10000, size = int(1e8))
arr[arr > 5000]

Numpy How to add a dimension to an array

  • np.newaxis : keyword , Use the syntax of the index to add dimensions to the array
  • np.expand_dims(arr, axis) : Method , and np.newaxis Realize the same function , to arr stay axis Position add dimension
  • np.reshape(a, newshape) : Method , Set a dimension to 1 Complete upgrading

One dimensional vector adds dimension (newaxis)

arr[np.newaxis, :] # Add a row dimension 
arr[:, np.newaxis] # Add a column dimension 

Adding dimensions to a one bit array (expand_dims)

np.expand_dims(arr, axis = 0)
np.reshape(arr, (1,5))

Data merge operation

  • Add multiple lines

  • Add multiple columns

np.concatenate(array_list, axis = 0/1) # According to the specified axis A merger 
np.vstack
np.row_stack(array_list) # Data consolidation by row 
np.hstack
np.column_stack(array_list) # Data consolidation by column 

Pandas

Open source python Class library

pandas Reading data

  • Read in csv, tsv, txt.
    • Separate with commas ,tab Split plain text file
    • pd.read_csv
  • Read in excel
    • Microsoft xls perhaps xlsx file
    • ``pd.read_sql`
  • msql
    • Relational tables
    • pd.read_sql

Read in csv

ratings = pd.read_csv(fpath)
ratings.head() # Look at the first few lines of data 
ratings.shape # Look at the shape of the data 
ratings.columns # View the list of column names 
ratings.index # View index columns 
ratings.dtypes # Look at the data type of each column 

Read txt file

pru = pd.read_csv(
fpath,
sep = "\t", # Separator 
header = None, # Title Line 
names = ['pdate', 'pv', 'uv'] # Specifies the column name 
)

Read excel file

puv = pd.read_csv(fpath)

Read mysql Data sheet

import pymysql
conn = pymysql.connect(
host = '127.0.0.1',
user = 'root',
password = '',
database = 'txy',
charset = 'utf8'
)
mysql_page = pd.read_sql("select * from txy", con = conn)

Pandas data structure

  • DataFrame
  • Series

DataFrame

Two dimensional data , The whole table , Multiple rows and columns

Index of each column df.columns, The index of each row df.index

Series

One-dimensional data , A row or column

Objects similar to one-dimensional arrays , It's a set of data ( Different data types ) And a group with it

Generate from a list series

s1 = pd.Series([1, 'a', 5, 2, 7])
# Index on the left , On the right is the data 
s1.index # Get index 
s1.values # get data 
# Create a with a label index series
s2 = pd.Series([1, 'a', 5.2, 7]), index = ['d', 'b', 'a', 'c']

use python Dictionary creation Series

s3 = pd.Series(sdata_dict)
# key Become index ,value Become a value 

Query data according to label index

s2['a'] # Query individual values 
s2[['b', 'a']] # Return to one Series

DataFrame

  • Each column is a different data type
  • Existing row index index, There are also column indexes columns
  • Can be seen as by Series A dictionary made up of

Multidimensional dictionary creation DataFrame

data = {

'state' : ['Ohio', 'Nevada'],
'year' : [2000, 2002],
'pop' : [1.5, 1.7]
}
df = pd.DataFrame(data)

from DataFrame It was found that Series

  • If you only query one line 、 A column of , return pd.Series
  • If you query multiple rows 、 Multiple columns , return pd.DataFrame
df['year']
df[['year', 'pop']]
df.loc[1] # Look up a line 

Pandas Query data

df.loc Method

By line 、 Column tag value query

  • That's ok 、 Column passes in a single value , Achieve accurate matching
  • Pass in a list, Batch query
  • Pass in an interval , Make a range query ·[ Start of interval , End of interval ]
  • Conditions of the query
  • call lambda Anonymous function or function query

df.iloc Method

By line 、 Column number position query

df.where Method

df.query Method

Pandas New data column

Direct assignment

df.loc[:, "wencha"] = df["bWendu"] - df["yWendu"]

df.apply Method

df.loc[:, "wendu_type"] = df.apply(get_wendu_type, axis = 1)

df.assign Method

# You can add multiple new columns at the same time 
df.assign(
yWendu huashi = lambda x : x["yWendu"] * 9 / 5 + 32, # It could be a function 
bWendu_huashi = lambda x : x["bWendu"] * 9 / 5 + 32
)

Select groups according to conditions and assign values respectively

df['wencha_type'] = ''
df.loc[df["bWendu"] - df["yWendu"] > 10, "wencha_type"] = " Big temperature difference "

Data statistics function

Summary statistics

df.describe() # Extract the statistical results of all digital columns 

Unique de duplication and count by value

  • For non numeric columns

df["fengxiang"].unique Enumerate all columns

df["fengxiang"].value_counts() Count

Correlation coefficient and covariance

The correlation coefficient : Measure the degree of similarity .1: The maximum positive similarity .-1: Maximum reverse similarity

covariance : Measure the degree of same direction and reverse direction . just : Same change . negative : Reverse motion

df.cov(): Covariance matrix

df.corr(): correlation matrix

Pandas Missing value processing

isnull and notnull function

Check whether it is null

dropna

discarded 、 Delete missing value

  • axis: Delete row or column
  • how: If it is equal to any If any value is empty, delete , If it is equal to all Then all values are null before deleting
  • inplace: If True Then modify the current df, Otherwise return to the new df

fillna

Fill in empty values

  • value: Value used for filling , It can be a single value , Or a dictionary
  • method:ffill Fill... With the previous value that is not empty ,bfill Fill... With a value that is not empty
  • axis: Fill by row or column
  • inplace: Equate to dropna
st = pd.read_csv("./data.xlsx", skiprows = 2) # Skip the previous blank line 
st.dropna(axis="columns". how="all", inplace=True)
st.fillna({
" fraction ", 0})

Waring: setting with copy

df[condition]["wencha"] = df["bWendu"] - df["yWendu"]
# Equate to 
df.get(condition).setr(wen_cha)

pandas Of dataframe Modify write operation , Only allowed at source dataframe on , One step in place

df.loc[condition, "wen_cha"] = df["bWendu"] - df["yWendu"] # 1
df_month3 = df[condition].copy()
df_month3["wencha"] = df["bWendu"] - df["yWendu"]

pandas It is not allowed to filter the child first dataframe, Then modify and write

sort_value(by, asscending = True, inplace = False)

inplace: Whether to modify the original DataFrame

by: Sort by

Pandas string manipulation

  • Usage method : First get Series Of str attribute , Then call the function on the property
  • Can only be called by string of characters
  • DataFrame Shangwu str Properties and processing methods

startswith(): Start with something in parentheses

replace(): Replace

index

df.set_index("userId", inplace = True, drop = False)

drop Keep the index column at column

df.index

Query index

python Automatically optimize according to data type

index Automatic alignment

Merge usage

Press different tables key Associate to a table

Be similar to sql Of join

Concat usage

Batch merge of the same format Excel, to DataFrame add rows , to DataFrame Add columns

You can do something similar to sql Of qroup by

Multiindex Hierarchical index


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved