import platformprint("Python:", platform.python_version())import numpy as npprint("Numpy:", np.__version__)import spacyprint("spacy:", spacy.__version__)
Python: 3.9.18
Numpy: 1.24.4
spacy: 3.7.4
Load data (LLM and Episode model statements)
nlp = spacy.load("en_core_web_lg")file=open("data/items.txt", "r")content=file.readlines()print(content[0]) # print the first item print(content[494]) # print the last itemfile.close()
The music made me feel calm and relaxed
I preferred listening to songs I knew rather than ones I didn't know
Calculate similarities
This will calculate the full similarity matrix (only upper triangle would be needed, no diagonal), 495 x 495 = 245,025 comparisons. This is inefficient, but I’ll let it slip as once off operation. :-)
N =495df = np.zeros((N, N))for k inrange(N):for l inrange(N): doc1 = nlp(str(content[k])) doc2 = nlp(str(content[l])) similarity = doc1.similarity(doc2) df[k,l] = similaritynp.savetxt("data/similarity.csv",df,delimiter=',',fmt='%.5f')print('Matrix completed')