这几天,回顾了一下汉初人物,韩信算是很可惜的一位。粗略的处理了一下王学孟先生译的《淮阴侯列传》。网上下了一个韩信的画像,开始是个铅笔画,效果极差,后来将图片处理为黑白将空白处涂黑,勉强可看。
#!usr/bin python3.7
# _*_ encoding:utf-8 _*_
import jieba from wordcloud import WordCloud,STOPWORDS,ImageColorGenerator from imageio import imread import matplotlib.pyplot as plt import wordcloud
text = open('淮阴侯列传.txt').read() # back_color = imread('hanxin.png') back_color = imread('韩信.png')
wc = WordCloud(background_color='white', max_words=100000, mask=back_color, min_font_size=2, max_font_size=100, stopwords=STOPWORDS.add('韩信'), font_path='simfang.ttf', width=30000, height=24000, random_state=42) def process_words(text): words_list = [] jieba.add_word('韩信') words_generator = jieba.cut(text,cut_all=False) with open('stopwords.txt') as f: str_text = f.read() unicode_text = str_text f.close() for word in words_generator: if word.strip() not in unicode_text: words_list.append(word) return ' '.join(words_list)
text = process_words(text) import nltk wc.generate(text) image_colors = ImageColorGenerator(back_color)
plt.imshow(wc,interpolation='bilinear') plt.axis('off') plt.figure() plt.imshow(wc.recolor(color_func=image_colors)) plt.axis('off') plt.show()
淮阴侯词云(彩)
淮阴侯词云(黑白)
plt.imshow(back_color) plt.show()
淮阴侯
wordcloud = WordCloud(max_font_size=100, font_path='simfang.ttf').generate(text) plt.figure() plt.imshow(wordcloud,interpolation='bilinear') plt.axis('off') plt.show()
淮阴侯列传词云
文中最凸显的是汉王,天下,军队。从常识来看也是正常。由于处理的比较粗略,如‘不能’,‘他们’等也比较大。
,