Think in yasam: 7月 2017

2017年7月31日星期一

source command not found

$ls -l `which sh`
/bin/sh -> dash

$sudo dpkg-reconfigure dash #Select "no" when you're asked
[...]

$ls -l `which sh`
/bin/sh -> bash

reference:
https://stackoverflow.com/questions/13702425/source-command-not-found-in-sh-shell

caffe

圖像轉DB

convert_imageset [FLAGS] ROOTFOLDER/ LISTFILE DB_NAME

creat_filelist.sh

# /usr/bin/env sh
DATA=examples/images
echo "Create train.txt..."
rm -rf $DATA/train.txt
find $DATA -name *cat.jpg | cut -d '/' -f3 | sed "s/$/ 1/">>$DATA/train.txt
find $DATA -name *bike.jpg | cut -d '/' -f3 | sed "s/$/ 2/">>$DATA/tmp.txt
cat $DATA/tmp.txt>>$DATA/train.txt
rm -rf $DATA/tmp.txt
echo "Done.."

create_lmdb.sh

#!/usr/bin/en sh
DATA=examples/images
rm -rf $DATA/img_train_lmdb
build/tools/convert_imageset --shuffle \
--resize_height=256 --resize_width=256 \
/home/xxx/caffe/examples/images/ $DATA/train.txt  $DATA/img_train_lmdb

計算mean值
/opt/caffe/build/tools/compute_image_mean ./my_data/img_train_lmdb ./my_caffe/my_mean.binaryproto

conver_mean.py
#!/usr/bin/env python
import numpy as np
import sys,caffe

if len(sys.argv)!=3:
print "Usage: python convert_mean.py mean.binaryproto mean.npy"
sys.exit()

blob = caffe.proto.caffe_pb2.BlobProto()
bin_mean = open( sys.argv[1] , 'rb' ).read()
blob.ParseFromString(bin_mean)
arr = np.array( caffe.io.blobproto_to_array(blob) )
npy_mean = arr[0]
np.save( sys.argv[2] , npy_mean )

2017年7月27日星期四

pyodbc connect to sql server

Install:

https://github.com/mkleehammer/pyodbc/wiki/Install

code:
import pyodbc
#connect to db
conn = pyodbc.connect(
r'DRIVER={ODBC Driver 13 for SQL Server};'
r'SERVER=127.0.0.1;'
r'DATABASE=DB_table;'
r'UID=yasam;'
r'PWD=password'
)
cursor = conn.cursor()
sqlInsert="INSERT INTO [dbo].[test_table](RecordID,Model,SubmitDate,CountryCode,Score,Comment) VALUES "

for i, d in enumerate(df):
print(i)
#truncate to 1024 for db size
if len(d[5])>1024:
d[5]=d[5][:1024]
if len(d[1])>32:
d[2]=d[1][:32]
d[5]=d[5].replace("'","''") # Comment, replace ' for sql
d[2]=d[2].replace("'","''") #Model
d[4]=str(d[4]) # float to str
temp="("+",".join(["N'"+dd+"'" for dd in d])+")" #N for encoding
tList.append(temp)
if i == len(df)-1:
text=','.join(tList)
cursor.execute(sqlInsert+text) #last insert
elif i % 10 == 9:
text=','.join(tList)
print(text)
cursor.execute(sqlInsert+text) #batch insert
tList=[]
temp=''
conn.commit()

reference:

pyodbc 用法

https://my.oschina.net/zhengyijie/blog/35587

Inserting multiple rows in a single SQL query

https://stackoverflow.com/questions/452859/inserting-multiple-rows-in-a-single-sql-query

Pyodbc query string quote escaping

使用兩個''避免或用?方式

unicode 問題(沒遇到)

http://blog.csdn.net/samed/article/details/50539742

2017年7月13日星期四

python3 gensim id2token not fund

rspList=sorted(glob.glob('./data/*'))
df=[]
for rsp in rspList:
data=pd.read_csv(rsp)
df.append(data)
df=pd.concat(df)
stoplist= set('i am you are he she is a for of the and to in'.split())
sents=df['translated_feedback'][df['translated_feedback']!='\\N'] #remove no response
texts=[[word for word in sent.translate(trans_table).lower().split()
if word not in stoplist] for sent in sents.values] #remove stopwords and punctuation
texts=list(filter(None,texts)) #filter empty list
#print(texts)
feq=defaultdict(int)
for text in texts:
for token in text:
feq[token]+=1

texts=[[token for token in text if feq[token]>1] for text in texts] #remove low frequency

dic=corpora.Dictionary(texts) #build dictionary

dic.save('./dictionary.dict')
#print(dic)
corpus =[dic.doc2bow(text) for text in texts] #build bag of words corpus
corpora.MmCorpus.serialize('./corpus.mm',corpus)
#print(corpus)

when i want to get id2token in dic, it is empty dictionary {}
dic.id2token
{}

I have to traverse(iterate) the dic ones
for k,v in dic.items():
pass
dic.id2token
{0:'yes',1:'got',2:'it'}

2017年7月31日 星期一