# 适合时间序列数据的计算脚本

### SQL

SQL历史悠久用户众多，在其模型框架内早已发展到极限，几乎每种简单运算都能找到对应的SQL解法，这其中就包括有序运算。

``select transDate,price,price/lag(price) over(order by transDate)-1 comp from stock1001 ``

``````With A as(SELECT    t1.transDate,t1.price, COUNT(*) AS rk
FROM stock1001 AS t1,
stock1001 AS t2
WHERE t1.transDate   >t2.transDate or (t1.transDate=t2.transDate and t1.price<=t2.price)
GROUP BY  t1.transDate, t1.price
ORDER BY rk)
select t1.transDate,(t1.price/t2.price-1) comp  from A as t1 left join  A as t2on t1.rk=t2.rk+1 ``````

SQL计算中位数的代码：

``````With A as (select studentdid,math,   row_number() over (order by math) rk from scores),
B as  (select count(1)  L from scores)
select avg(math)  from A where   rk in (
select case when   mod(L,2)=1 then   ((L+1) div 2)  else ( L div 2) end no from B
union
select case when   mod(L,2)=1 then  ((L+1) div 2)  else (L div 2)+1 end  from B)) ``````

``````select max(continue_inc_days)
from (select count(*) continue_inc_days
from (select sum(inc_de_flag) over(order by transDate) continue_de_days
from (select transDate,
case when
price>LAG(price)   over(order by transDate)
then 0 else 1 end inc_de_flag
from AAPL) )
group by continue_de_days) ``````

### Python Pandas

Pandas是Python的结构化计算库，常被用作时间序列数据的计算脚本。

``````import pandas as pd
stock1001 ['comp'] = stock1001.math/ stock1001.shift(1).math-1 ``````

``````…
df=pd.read_csv('d:/scores.csv')       #return  as a DataFrame  math=df['math']
L=len(df)
if L % 2 == 1:
result= math[int(L / 2)]
else:
result= (math[int(L / 2 - 1)] +   math[int(L / 2)]) / 2  print(result) |``````

``````…
aapl = pd.read_sql_query("select price from AAPL order by   transDate", conn)
continue_inc_days=0 ; max_continue_inc_days=0
for i in aapl['price'].shift(0)>aapl['price'].shift(1):
continue_inc_days =0 if   i==False else continue_inc_days +1
max_continue_inc_days = continue_inc_days   if max_continue_inc_days < continue_inc_days else max_continue_inc_days
print(max_continue_inc_days)
conn.close() ``````

Pandas在有序计算方面的确比SQL容易些，但遇到更复杂的情况，Pandas也会变得很繁琐，下面试举两例。

``````import pandas as pd
sale_info.sort_values(by=‘Amount’,inplace=True,ascending=False)
half_amount = sale_info[‘Amount’].sum()/2
vip_list = []
amount = 0
for client_info in sale_info.itertuples():
amount += getattr(client_info, ‘Amount’)
if amount < half_amount:
vip_list.append(getattr(client_info, ‘Client’))
else:
vip_list.append(getattr(client_info, ‘Client’))        breakprint(vip_list) |``````

``````import pandas as pd
CL = stock1001[‘CL’]
CL_psort = CL.argsort()[::-1].iloc[:3].values
CL_psort_shift1 = CL_psort-1
CL_rise = CL[CL_psort].values/CL[CL_psort_shift1].values-1
max_3 = stock1001.loc[CL_psort].reset_index(drop = True)
max_3[‘RISE’] = CL_rise
print(max_3)``````

### esProc

A B
1 =file(“d:/stock1001.csv”).import@tc() /读csv文件
2 =A1.derive(price/price[-1]-1:comp) /用相对位置计算比上期

A
1
2 =L=A1.len()
3 =if(A2%2==0,A1([L/2,L/2+1]).avg(math),A1((L+1)/2).math)

A
1
2 =a=0,A1.max(a=if(price>price[-1],a+1,0))

esProc是更为专业的结构化计算语言，即使遇到更复杂的有序计算，也能较为轻松地实现。

A B
1 =demo.query(“select client,amount from sales”).sort(amount:-1) 取数并逆序排序
2 =A1.cumulate(amount) 计算累计序列
3 =A2.m(-1)/2 最后的累计值即是总和
4 =A2.pselect(~>=A3) 超过一半的位置
5 =A1(to(A4))

A B
1 =file(“d:/stock1001.csv”).import@tc() /取数
2 =A1.ptop(-3,price) /股价最高的3天的位置
3 =A1.calc(A2,price/price[-1]-1) /计算这三天的涨幅
4 =A1(A2).new(transDate,price,A3(#):comp) /用列拼出二维表

(=￣ω￣=)··· 暂无内容！

15

0

2

4