侧边栏壁纸
博主头像
落叶人生博主等级

走进秋风,寻找秋天的落叶

  • 累计撰写 130562 篇文章
  • 累计创建 28 个标签
  • 累计收到 9 条评论
标签搜索

目 录CONTENT

文章目录

Python统计数据的频率

2023-11-04 星期六 / 0 评论 / 0 点赞 / 28 阅读 / 1303 字

# -*- coding: UTF-8 -*- #!/usr/bin/env pythonfrom collections import Counterimport collections impor

# -*- coding: UTF-8 -*- #!/usr/bin/env pythonfrom collections import Counterimport collections import jieba.analyseimport jiebaimport timeimport reimport sys#去除停用词#stopwords = {}.fromkeys(['的', '包括', '等', '是'])stopwords = {}.fromkeys([ line.strip() for line in open("stopwords.txt") ])#读取文件路径bill_path = r'article_nohtml.txt'#写入文件路径bill_result_path = r'result.txt'#读取文件with open(bill_path,'r') as fr:	all_the_text = fr.read()#处理特殊字符all_the_text = re.sub("/"|,|/.", "", all_the_text)#分词data = jieba.cut(all_the_text)#计算频率data = dict(Counter(data))#以词频排序def sort_by_count(d):      #字典排序      d = collections.OrderedDict(sorted(d.items(), key = lambda t: -t[1]))      return ddata = sort_by_count(data)  #将结果集写入文件with open(bill_result_path,'w') as fw:    for k,v in data.items():	k = k.encode('utf-8')	#处理停用词	if k not in stopwords:	#写入结果		#fw.write(str(k)+':'+str(v)+'/n')		#fw.write("%s,%d/n" % (k,v)) 		fw.write(str(k)+':%d'%v + '/n')#关闭流fw.close()

运行结果图

广告 广告

评论区