您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Pandas vs. SQL 1_ nanyangjx

編輯：Python

The present python It is widely used in data processing and quantitative analysis data mining , This article will focus on python in pandas Under bag DataFrame The use of this data type is described in detail , And its equivalent in data processing SQL The contrast in , Don't talk much , On and off ！！！
https://blog.csdn.net/weixin_39818658/article/details/79944925

keyword ：pandas、DataFrame、SQL

1、 About filtering data table columns

Yes DataFrame（ That is, a two-dimensional data table ）data：

Use

data=DataFrame(np.arange(9).reshape(3,3),columns=[‘a’,‘b’,‘c’])

Statement to generate the DataFrame

data :

a b c

0 0 1 2

1 3 4 5

2 6 7 8

If we only take data Of a Column Use a slicing operation directly （ notes ： The selection slice operation on the column is a two-layer bracket ）：

data[[‘a’]]

--------->

0 0

1 3

2 6

Corresponding SQL Statement for ：

select a from data

If we want to take a and b The two fields are also slicing operations ：

data[[‘a’,‘b’]]

------>

a b

0 0 1

1 3 4

2 6 7

Corresponding SQL Statement for ：

select a,b from data

First, let's create two here DataFrame：

Basic information data sheet of the company

companydata=DataFrame({'ID':['aa','bb','cc'],
'name':[' company A',' company B',' company C'],
'establish_year':['2001','2002','2003'],
'city':['ShangHai','BeiJing','ShenZhen']})

----->

ID city establish_year name

0 aa ShangHai 2001 company A

1 bb BeiJing 2002 company B

2 cc ShenZhen 2003 company C

List of products issued by the company ：

productdata=DataFrame({'ID':['aa','aa','bb','bb','cc'],
'productname':['stock_A','bond_A','stock_B','bond_B','stock_C']
})

--------->

ID productname

0 aa stock_A

1 aa bond_A

2 bb stock_B

3 bb bond_B

4 cc stock_C

2、 Index of data 、 selection 、 Filter

（1） Single condition screening

First select ’ company A’ All the information about , The operation is as follows ：

companydata[companydata[‘name’]==‘ company A’]
---------------------------------------------------------------------------->
ID city establish_year name

0 aa ShangHai 2001 company A
amount to SQL：
select * from companydata where name=’ company A’

（2） Multi criteria screening （ notes ： The multi criteria filter criteria should be written in brackets ）

selection ’ company B’ also ’city’ by ’BeiJing’, The operation is as follows ：

companydata[(companydata[‘name’]‘ company B’)&(companydata[‘city’]‘BeiJing’)]

-------->

ID city establish_year name

1 bb BeiJing 2002 company B

amount to SQL:

select * from companydata where name=’ company A’ and city=’BeiJing’

selection ’ID’ by ’aa’ perhaps ’cc’ All the information about , The operation is as follows ：

companydata[(companydata[‘ID’]‘aa’)|(companydata[‘ID’]‘cc’)]

------>

ID city establish_year name

0 aa ShangHai 2001 company A

2 cc ShenZhen 2003 company C

amount to SQL:

select * from companydata where ID=’aa’ or ID=’cc’

perhaps

companydata[companydata[‘ID’].isin([‘aa’,‘cc’])]

------->

 ID city establish_year name

0 aa ShangHai 2001 company A

2 cc ShenZhen 2003 company C

amount to SQL：

select * from companydata where ID in (‘aa’,’cc’)

（3） Sometimes we don't need all the fields after filtering, but only one or several fields ,

for example ：

View company A also ’ID’ by ’aa’ Of ’ Year of establishment ’ and ’ city ’

We need to apply it to DataFrame An index method of ix, The operation is as follows ：

companydata.ix[(companydata[‘name’]‘ company A’)&(companydata[‘ID’]‘aa’),[‘city’,‘establish_year’]]

---------->

 city establish_year

0 ShangHai 2001

amount to SQL:

select city,establish_year from companydata where ID=’aa’ and name=’ company A’

3、 Label according to different data , That is to say SQL Medium case when … then … operation

for example ： When the city where the company is located ’city’ ,’ShangHai’ When is ’SH’,’BeiJing’ When is ’BJ’,’ShenZhen’ When is ’SZ’, The operation is as follows ：

fuc=lambda x:'SH' if x=='ShangHai' else 'BJ' if x=='BeiJing' else 'SZ' if x=='ShenZhen' else np.NaN
companydata['citylabel']=companydata['city'].apply(fuc)
print(companydata)

（ I can only think of such a way of writing about this , If there is a clever way , Please grant me your advice ）

------->

ID city establish_year name citylabel

0 aa ShangHai 2001 company A SH

1 bb BeiJing 2002 company B BJ

2 cc ShenZhen 2003 company C SZ

amount to SQL：

select
ID,
city,
establish_year,
name,
case ‘city’ when ‘'ShangHai' ’ then ‘SH’
when 'BeiJing' then ‘BJ’
when ‘'ShenZhen' ’ then ‘SZ’ else null
end as citylabel
from companydata

4、 Aggregation of data

example ：

surface productdata Yes ID Grouping , And calculate each ID There are several financial products

productdata[productname].groupby(productdata[‘ID’]).count()

------>

aa 2

bb 2

cc 1

Name: ID, dtype: int64

amount to SQL：

select count(productname) from productdata group by ID

productdata[productname].groupby([productdata[‘ID’],productdata[‘productname’]]).count()