聚类算法参考这篇文章,本文是根据聚类算法得出的数据来绘制图像。
首先要对数据处理一下,在 DBSCAN 的算法中,我最后输出的 clusterID 不是连续的,为了方便做图我把所有点的 clusterID 从 0 开始按顺序排好,这段的代码是:
/* deal.cpp input: out.txt(x, y, clusterID) 9.000000 1.000000 5 10.000000 10.000000 16 2.100000 7.100000 1 1.100000 1.100000 1 1.100000 2.100000 1 1.100000 3.100000 1 1.100000 4.100000 1 8.000000 8.000000 6 9.000000 8.000000 6 1.100000 5.100000 1 2.100000 1.100000 1 2.100000 2.100000 1 2.100000 3.100000 1 2.100000 4.100000 1 2.100000 5.100000 1 2.100000 6.100000 1 8.000000 9.000000 6 output: pic.txt 2.1 7.1 0 1.1 1.1 0 1.1 2.1 0 1.1 3.1 0 1.1 4.1 0 2.1 6.1 0 2.1 5.1 0 1.1 5.1 0 2.1 1.1 0 2.1 2.1 0 2.1 3.1 0 2.1 4.1 0 9 1 1 8 9 2 9 8 2 8 8 2 10 10 3 */ #include<cstring> #include<cstdio> #include<cstring> #include<cmath> #include<vector> #include<string> #include<iostream> #include<algorithm> using namespace std; struct point{ double a, b; int c; }p[200000]; bool cmp(point x, point y){ return x.c < y.c; } int main(){ freopen("out.txt", "r", stdin); int id = 0; cout << id << endl; while(~scanf("%lf %lf %d", &p[id].a, &p[id].b, &p[id].c)){ id++; } sort(p, p+id, cmp); int flag = -1, pre = -1; freopen("pic.txt", "w", stdout); for(int i = 0; i < id; i++){ cout << p[i].a << " " << p[i].b << " "; if(p[i].c != pre){ cout << ++flag << endl; pre = p[i].c; }else{ cout << flag << endl; } } return 0; }
然后用 python 做出图像就可以。
# coding=utf-8 import os import sys import matplotlib.pyplot as plt # 支持8种不同颜色的点(0-7) color_list = ['b', 'c', 'g', 'k', 'm', 'r', 'w', 'y'] def read_data(filename, xmax=11.0, ymax=11.0): try: with open(filename) as f: for row in f.readlines(): x, y, n = row.split(' ') c = color_list[int(n)] draw_axes(xmax, ymax, x, y, c) except FileNotFoundError as e: print('No such file: ', e) sys.exit(-1) def draw_axes(xmax, ymax, x, y, color): plt.axis((0, float(xmax), 0, float(ymax))) plt.scatter(x, y, c=color) if __name__ == '__main__': filename = input('Please enter filename:') xmax = input('Please input xmax:') ymax = input('Please input ymax:') filename = os.getcwd() + '\\' + filename read_data(filename, xmax, ymax) plt.show()
效果:
做图的代码设置了 8 种颜色,刚才突然想到如果所需颜色很多的话,可以对 clusterID 取模来配色。这样就保证每个簇一个颜色但是会有重复的。如果这样处理的话就不需要 deal.cpp 来处理数据了。