博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
sintimental analysis
阅读量:6811 次
发布时间:2019-06-26

本文共 7339 字,大约阅读时间需要 24 分钟。

#Author:Mini #!/usr/bin/env python import jieba import numpy as n import pymysql conn = pymysql.connect(host="127.0.0.1", user="root", passwd="wangmianny111", db="galaxy_macau_ad",charset='utf8') jieba.load_userdict("C:/Users/Administrator/Desktop/tripadvisor_gm/tripadvisor_code_python/galaxy_macau_dict.txt") jieba.load_userdict("C:/Users/Administrator/Desktop/tripadvisor_gm/tripadvisor_code_python/chinese_sentiment_score/positive_dic.txt") jieba.load_userdict("C:/Users/Administrator/Desktop/tripadvisor_gm/tripadvisor_code_python/chinese_sentiment_score/negative_dic.txt") def open_dict(Dict = 'mini', path=r'/Users/apple888/PycharmProjects/Textming/Sent_Dict/Hownet/'):     path = path + '%s.txt' % Dict     dictionary = open(path, 'r', encoding='utf-8')     dict = {} for line in dictionary:         seperate_word = line.strip().split(",")         num = len(seperate_word) for i in range(1, num):             dict[seperate_word[i]] = seperate_word[i] #print (dict) try: for word in dictionary:             word=word.strip(',')             jieba.suggest_freq(word, tune=True)  # change the frequency except: print ("memery run out!") return dict def sentiment_score_list(dataset):     seg_sentence = []     seg_sentence1 = dataset.split('。') for item in seg_sentence1:         seg_sentence2=item.split(',')         seg_sentence+=seg_sentence2 print(seg_sentence) return seg_sentence def judgeodd(num): if (num % 2) == 0: return 'even' else: return 'odd' deny_word = open_dict(Dict = 'deny', path= r'C:/Users/Administrator/Desktop/tripadvisor_gm/tripadvisor_code_python/chinese_sentiment_score/') posdict = open_dict(Dict = 'positive', path= r'C:/Users/Administrator/Desktop/tripadvisor_gm/tripadvisor_code_python/chinese_sentiment_score/') negdict = open_dict(Dict = 'negative', path= r'C:/Users/Administrator/Desktop/tripadvisor_gm/tripadvisor_code_python/chinese_sentiment_score/') """ degree_word = open_dict(Dict = '程度级别词语', path= r'C:/Users/Administrator/Desktop/Textming/') mostdict = degree_word[degree_word.index('extreme')+1 : degree_word.index('very')]#权重4,即在情感词前乘以4 verydict = degree_word[degree_word.index('very')+1 : degree_word.index('more')]#权重3 moredict = degree_word[degree_word.index('more')+1 : degree_word.index('ish')]#权重2 ishdict = degree_word[degree_word.index('ish')+1 : degree_word.index('last')]#权重0.5 """ combine_dict = {} for line in open("C:/Users/Administrator/Desktop/tripadvisor_gm/tripadvisor_code_python/chinese_sentiment_score/synonyms.txt", "r",encoding='utf-8'):     seperate_word = line.strip().split(",")     jieba.suggest_freq(seperate_word, tune=True)  # change the frequency     #print (seperate_word) num = len(seperate_word) #print(num) for i in range(1, num):         combine_dict[seperate_word[i]] = seperate_word[0] #print (seperate_word[0]) print("loading dic and changing freq finished!") def sentiment_score_list(dataset): print(dataset)     seg_sentence = []     seg_sentence1 = dataset.split('。') #print(seg_sentence1) count1 = [] count2 = [] for item in seg_sentence1:         seg_sentence2=item.split(',') #print (seg_sentence2) seg_sentence+=seg_sentence2 print(seg_sentence) #print (len(seg_sentence)) poscount_service1 = 0  # (fist time) caculate the value of this postive word     #sinsitive_count_service = 0 poscount_service2 = 0  # postive value after considering about the deny words negcount_service1 = 0 negcount_service2 = 0 score_service = 0  # final positive value s = 0  # record the sum of number of sinsitive words for sen in seg_sentence: #traverse each clause of comments segtmp = jieba.lcut(sen, cut_all=False)  #cut the word, return a list of words i = 0 #record the being-scanning-word's location a = 0 #record the being-scanning-sintimental-word's location         #print(segtmp) for word in segtmp: if word =="demond_show": print ("the customer is talking about "+word) for word in segtmp: print (word) if word in posdict:  # if it is a postive word print("this customer's attitude is positive!")                 poscount_service1 = 5 s+=1 c = 0 for w in segtmp[a:i]:  # scan the words before the sinsitive word if w in deny_word:                         c += 1 if judgeodd(c) == 'odd':  # scan deny words poscount_service1 = 1 poscount_service2 += poscount_service1 poscount_service1 = 0 else:                     poscount_service2 = poscount_service1 + poscount_service2 poscount_service1 = 0 a = i + 1  # 情感词的位置变化 print(poscount_service2) elif word in negdict:  # 消极情感的分析,与上面一致 negcount_service1 = 1 s+=1 d = 0 for w in segtmp[a:i]: if w in deny_word:                         d += 1 if judgeodd(d) == 'odd':                     negcount_service1=5 negcount_service2 += negcount_service1 negcount_service1 = 0 #negcount3 = negcount + negcount2 + negcount3 else:                     negcount_service2 += negcount_service1 negcount_service1 = 0 a = i + 1 else: pass i += 1 # 扫描词位置前移 else: print("not talking about this certain topic!") print("s"+str(s)) if s==0: pass     else:      score_service = (poscount_service2 + negcount_service2)/s      score_service = float('%.1f' % score_service)      count1.append(score_service) # sql = "UPDATE tripadvisor_chinese SET service = '"+score_service+"' WHERE ID = '"+ID+"' ;"      #conn.query(sql)      #conn.commit()     #count2.append(count1)     #count1 = [] print (count1) return score_service def sentiment_score(senti_score_list): print(ID+":senti_score_list:"+str(senti_score_list)) if senti_score_list==0: pass     else:      sql = "UPDATE tripadvisor_chinese SET demond_show = '"+str(senti_score_list)+"' WHERE customer_num = '"+str(index)+"' ;" conn.query(sql)      conn.commit() print("sucess!" ) """ test1='兔子一号 我中意澳门银河,尤其喜欢银河酒店的房间还有服务,服务特别周到,服务特别好。' test2='兔子二号 澳门银河的服务一点也不好,很差劲。' test3='兔子三号 服务不能说不好,也不是很差。' """ """data_combine="" for chinese_data in open("C:/Users/Administrator/Desktop/tripadvisor_gm/tripadvisor_code_python/chinese_sentiment_score/tripadvisor_chinese.txt", "r",encoding='utf-8'):  chinese_comment= chinese_data.strip().split("\n")  print(chinese_comment) #data=[test1,test2,test3]  #data_combine=""  for comment in chinese_comment:   print(comment)   combine_sentence = ""   words_1 = jieba.cut(comment)   for word in words_1:     #print(word)     if word in combine_dict:         word = combine_dict[word]         combine_sentence += word     else:         combine_sentence += word   print(combine_sentence)   data_combine += combine_sentence+"\n"   print(data_combine) f_combine = open("C:/Users/Administrator/Desktop/tripadvisor_gm/tripadvisor_code_python/chinese_sentiment_score/combine_chinese.txt", "a", encoding="utf_8") f_combine.write(data_combine) print (data_combine)""" index=1 for combine_data in open("C:/Users/Administrator/Desktop/tripadvisor_gm/tripadvisor_code_python/chinese_sentiment_score/combine_chinese.txt", "r",encoding='utf-8'):     seperate_sentice = combine_data.split("\n") print(seperate_sentice) for item in seperate_sentice: if item=="": pass       else:           ID_list = item.strip().split('\t') #for i in ID_list:            #print("ID_list:"+i) ID = ID_list[0].replace('"','') print("ID"+ID)           service_score=sentiment_score(sentiment_score_list(item)) print(sentiment_score(sentiment_score_list(item))) print("index:"+str(index))           index+=1
float('%.1f' % score_service)

转载于:https://www.cnblogs.com/rabbittail/p/8336271.html

你可能感兴趣的文章
TEA encryption with 128bit key
查看>>
操作系统定期定时执行python脚本
查看>>
TCP的拥塞控制
查看>>
FZU 1894 志愿者选拔 单调队列
查看>>
**app后端设计(10)--数据增量更新(省流量)
查看>>
用SoapUI进行Webservice的性能压力测试
查看>>
.NET反编译之manager,base.AutoScaleMode修复
查看>>
光看这图片就知道是大片--今天是五一劳动节尽管还是敲着代码(日常就是这样)然后想不出写什么了,也找不到好的素材,最后开心一下吧...
查看>>
希尔排序算法
查看>>
【Cocos2d-Js基础教学(3)各种基类的定义和使用】
查看>>
java.util.logging.Logger使用详解
查看>>
Sql Server -更新语句,修改的字段是日期时间型,修改其中的月份
查看>>
【转】linux下tty,控制台,虚拟终端,串口,console(控制台终端)详解----不错...
查看>>
Vertica增加一个数据存储的目录
查看>>
小小的告别一下这个博客
查看>>
【转】内核编译时, 到底用make clean, make mrproper还是make distclean(转载)
查看>>
The YubiKey NEO
查看>>
看一下你在中国属于哪个阶层?
查看>>
Collections.sort方法对list排序的两种方式
查看>>
Synchronize Ultimate
查看>>