Python と OpenCV で類似画像検索 - Sabrou-mal サブロウ丸

全く同じ画像だけではなく、より幅広く'似てそう'な画像を探します。

OpenCVのインストール

OpenCVのインストールに関するページがネット上に多数あることから、OpenCVのインストールの難しさ(というより、やっかいさ?)が伺えますが、例に違わず僕もOpenCVのインストールに手間取ってしましました。

最終的には、
・brewでインストールしたnumpy, opencv3をアンインストールした上で、
・pip3 install opencv-python
でopencvがpyhtonで使えるようになりました、のでご参考までに。

画像類似度計算

ヒストグラムの類似度により、画像の類似度を計算します。ヒストグラムとは画像の明度の分布のことです。(ヒストグラム - CyberLibrarian)

OpenCVのcalcHist()関数で画像のヒストグラムを計算できます。
またヒストグラムの比較はcompareHist()関数でできます。((’ω')便利ﾆﾅｯﾀﾖﾈｰ)

ヒストグラムの比較には4つの手法があり、それぞれ特徴があります。(参考ページ)(Histograms — OpenCV 2.4.13.7 documentation)(ヒストグラムの比較 - にのせき日記)

各手法に対し閾値(threshold)を設定してやり、類似度が閾値以上(手法によっては以下)の画像グループを抜き出して、表示するプログラムを作成しました。画像が入ったフォルダを指定すると、そのフォルダの画像の中で類似しているものを探します。

プログラムの__doc__に説明を書いてありますが、
METHODに0~3を入れて手法を選択、THRESHOLDに数字を入れて閾値を設定してやるとプログラムが動きます。(それぞれデフォルトを一応用意しました) <--- あくまで僕の経験則なので、自分でどんどんいじってください。

プログラム中の連結成分の計算のところは、例えば画像Aと画像Bが類似度が高く、かつ、画像Bと画像Cも類似度が高い時に、類似度が高い画像グループとしてA, B, Cを出力するためのデータ整理をしています。

	# ---- memo -----------------------
	# フォルダの中にある画像の中で
	# 類似したものを見つける
	# ---------------------------------
	import cv2
	from sys import argv
	from glob import glob
	from itertools import combinations
	import subprocess as sb
	from os import environ

	METHOD_NAME = ['Correlation', 'Chi-square', 'Intersection', 'Bhattacharyya distance']
	THRESHOLD_DEF = [0.9, 100000, 1600000, 0.15]

	# environment variable
	METHOD = int(environ.get('METHOD', 3))
	THRESHOLD = float(environ.get('THRESHOLD', THRESHOLD_DEF[METHOD]))


	__doc__="""
	Usage:
	[METHOD=(int)] [THRESHOLD=(float)] python3 {f} folder_name

	environment variable
	METHOD:
	0: Correlation
	1: Chi-square
	2: Intersection
	3: Bhattacharyya distance <- Default

	THRESHOLD:
	estimated THRESHOLD
	METHOD = 0 -> 0.88 ~ 1 (Default is 0.9)
	METHOD = 1 -> 80000 ~ 120000 (Default is 100000)
	METHOD = 2 -> 1200000 ~ 2000000 (Default is 1600000)
	METHOD = 3 -> 0.05 ~ 0.2 (Default is 0.15)

	exsample)
	METHOD=0 python3 {f} folder_name
	METHOD=3 THRESHOLD=0.2 python3 {f} folder_name
	""".format(f=__file__)

	def usage():
	# print('Usage:\n\t[THRESHOLD=(float)] [METHOD=(int)] python3 {f} folder_name'.format(f=__file__))
	print(__doc__)
	exit()

	def matching():
	print('METHOD: {}'.format(METHOD_NAME[METHOD]))
	print('THRESHOLD: {}'.format(THRESHOLD))

	# 画像リストの作成(gifは除外)
	Pictures = glob('{folder}/*'.format(folder=FolderName))
	Pictures[:] = [pict for pict in Pictures if pict.split('.')[-1] != 'gif']

	# ヒストグラムの計算
	image_hists = dict()
	for picture in Pictures:
	im = cv2.imread(picture)
	image_hists[picture] = cv2.calcHist([im], [0], None, [256], [0, 256])

	# 類似度の計算
	result = list()
	for pictA, pictB in combinations(Pictures, 2):
	image_histA, image_histB = image_hists[pictA], image_hists[pictB]
	tmp = cv2.compareHist(image_histA, image_histB, METHOD)
	if METHOD in [1, 3]:
	if tmp < THRESHOLD:
	result.append((tmp, pictA, pictB))
	else:
	if tmp > THRESHOLD:
	result.append((tmp, pictA, pictB))


	# 連結成分の計算
	repr_dict = dict()
	pict2repr = dict()
	for _, nodeA, nodeB in result:
	is_A, is_B = nodeA in pict2repr, nodeB in pict2repr
	if not is_A and not is_B:
	repr_dict[nodeA] = [nodeA, nodeB]
	pict2repr[nodeA] = nodeA
	pict2repr[nodeB] = nodeA
	if is_A and not is_B:
	repr_dict[pict2repr[nodeA]].append(nodeB)
	pict2repr[nodeB] = pict2repr[nodeA]
	if not is_A and is_B:
	repr_dict[pict2repr[nodeB]].append(nodeA)
	pict2repr[nodeA] = pict2repr[nodeB]
	if is_A and is_B:
	if pict2repr[nodeA] == pict2repr[nodeB]:
	continue
	repr_dict[pict2repr[nodeA]] += repr_dict[pict2repr[nodeB]]
	for pict in repr_dict[pict2repr[nodeB]]:
	pict2repr[pict] = pict2repr[nodeA]

	# 結果の出力
	# 類似画像なし
	if len(result) == 0:
	print('Same pictures are not found.')
	exit()

	# 類似画像あり
	print('Maybe same pictures ...')
	for i, _repr in enumerate(set(pict2repr.values())):
	print('[{}] {}'.format(i, ' '.join(repr_dict[_repr])))

	# 画像をOpenするかどうか
	print('Open those?(y or n)')
	ans = input()
	if ans in ['y', 'yes']:
	for _repr in set(pict2repr.values()):
	component = repr_dict[_repr]
	sb.run(['open'] + component)

	return 1

	if __name__ == '__main__':
	if len(argv) == 2 and argv[1] not in ['-h', '--help']:
	FolderName = argv[1] if argv[1][-1] != '/' else argv[1][:-1]
	matching()
	else:
	usage()

view raw image_compare.py hosted with ❤ by GitHub

実行結果

f:id:inarizuuuushi:20170121122548p:plain ネッコフォルダを作成します。

python3 image_compare.py ~/ネッコ/で上のスクリプトを実行すると

4つの類似画像グループができました。それぞれ
f:id:inarizuuuushi:20170121123026p:plain

確かに類似度が高そうなグループになっているな〜という感じですね。
METHOD=0 python3 image_compare.py ~/ネッコ/で手法を変更したり、
MEDHOD=3 THRESHOLD=0.12 python3 image_compare.py ~/ネッコ/などと閾値を変更すると結果も変わってきます。
色々試してみてください。

[商品価格に関しましては、リンクが作成された時点と現時点で情報が変更されている場合がございます。]

【送料無料】機械学習スタートアップシリーズこれならわかる深層学習入門 KS情報科学専門書 / 瀧雅人【本】
価格：3240円（税込、送料無料) (2018/9/3時点)