聚类算法参考这篇文章,本文是根据聚类算法得出的数据来绘制图像。
首先要对数据处理一下,在 DBSCAN 的算法中,我最后输出的 clusterID 不是连续的,为了方便做图我把所有点的 clusterID 从 0 开始按顺序排好,这段的代码是:
/*
deal.cpp
input: out.txt(x, y, clusterID)
9.000000 1.000000 5
10.000000 10.000000 16
2.100000 7.100000 1
1.100000 1.100000 1
1.100000 2.100000 1
1.100000 3.100000 1
1.100000 4.100000 1
8.000000 8.000000 6
9.000000 8.000000 6
1.100000 5.100000 1
2.100000 1.100000 1
2.100000 2.100000 1
2.100000 3.100000 1
2.100000 4.100000 1
2.100000 5.100000 1
2.100000 6.100000 1
8.000000 9.000000 6
output: pic.txt
2.1 7.1 0
1.1 1.1 0
1.1 2.1 0
1.1 3.1 0
1.1 4.1 0
2.1 6.1 0
2.1 5.1 0
1.1 5.1 0
2.1 1.1 0
2.1 2.1 0
2.1 3.1 0
2.1 4.1 0
9 1 1
8 9 2
9 8 2
8 8 2
10 10 3
*/
#include<cstring>
#include<cstdio>
#include<cstring>
#include<cmath>
#include<vector>
#include<string>
#include<iostream>
#include<algorithm>
using namespace std;
struct point{
double a, b;
int c;
}p[200000];
bool cmp(point x, point y){
return x.c < y.c;
}
int main(){
freopen("out.txt", "r", stdin);
int id = 0;
cout << id << endl;
while(~scanf("%lf %lf %d", &p[id].a, &p[id].b, &p[id].c)){
id++;
}
sort(p, p+id, cmp);
int flag = -1, pre = -1;
freopen("pic.txt", "w", stdout);
for(int i = 0; i < id; i++){
cout << p[i].a << " " << p[i].b << " ";
if(p[i].c != pre){
cout << ++flag << endl;
pre = p[i].c;
}else{
cout << flag << endl;
}
}
return 0;
}
然后用 python 做出图像就可以。
# coding=utf-8
import os
import sys
import matplotlib.pyplot as plt
# 支持8种不同颜色的点(0-7)
color_list = ['b', 'c', 'g', 'k', 'm', 'r', 'w', 'y']
def read_data(filename, xmax=11.0, ymax=11.0):
try:
with open(filename) as f:
for row in f.readlines():
x, y, n = row.split(' ')
c = color_list[int(n)]
draw_axes(xmax, ymax, x, y, c)
except FileNotFoundError as e:
print('No such file: ', e)
sys.exit(-1)
def draw_axes(xmax, ymax, x, y, color):
plt.axis((0, float(xmax), 0, float(ymax)))
plt.scatter(x, y, c=color)
if __name__ == '__main__':
filename = input('Please enter filename:')
xmax = input('Please input xmax:')
ymax = input('Please input ymax:')
filename = os.getcwd() + '\\' + filename
read_data(filename, xmax, ymax)
plt.show()
效果:
做图的代码设置了 8 种颜色,刚才突然想到如果所需颜色很多的话,可以对 clusterID 取模来配色。这样就保证每个簇一个颜色但是会有重复的。如果这样处理的话就不需要 deal.cpp 来处理数据了。
