Commit 08211f38 authored by Thomas Weighill's avatar Thomas Weighill
Browse files

Thomas added snippets and dense vectors

parent 1c790cb5
Loading
Loading
Loading
Loading
+44 −0
Original line number Diff line number Diff line
%% Cell type:code id:a144fa2a-d7a5-4c29-9223-bc17d8a09f42 tags:

``` python
import numpy as np
import pandas as pd
from tqdm import tqdm
import geopandas as gpd
import matplotlib.pyplot as plt
import networkx as nx
```

%% Cell type:markdown id:46a6719b-5757-4068-8f6b-b185a4595de2 tags:

## Reformatting cluster labels

Currently, the output of the clustering algorithm is a dataframe with columns plan_index, district, row_index, population, cluster_label.

For more efficient storage, we prefer a shorter dataframe with just plan_index, district, cluster_label.

%% Cell type:code id:73993582-f1a7-4daa-b3f8-f9d1920bd4f4 tags:

``` python
num_clusters = 30
input_filename = 'data/processed/centroids/ensemble_with_cluster_labels_k30.csv'
output_filename = 'cluster_labels_k30.csv'
```

%% Cell type:code id:5a03e2da-f6ee-425a-b396-b56c3d9a3685 tags:

``` python
df = pd.read_csv(input_filename) #load input file
```

%% Cell type:code id:28d0ff7d-f02c-4937-af6c-3f0264dffdeb tags:

``` python
newdf = df.groupby(by=['plan_index', 'district']).max() #group by plan_index and district and retain cluster label
```

%% Cell type:code id:4a154a8f-7580-40d7-a93b-35a13a0c0220 tags:

``` python
newdf[['cluster_label']].to_csv(output_filename) #output cluster label information to output file
```
+1119959 −0

File added.

Preview size limit exceeded, changes collapsed.

+1119959 −0

File added.

Preview size limit exceeded, changes collapsed.

+1119959 −0

File added.

Preview size limit exceeded, changes collapsed.

+2.76 GiB

File added.

No diff preview for this file type.

Loading