Academic Project • Fall 2023 • Group Work with Anudeep Chawla, Katsuya Masaki
The API is designed to seamlessly bridge the gap between different forms of media by providing tailored recommendations. Leveraging metadata analysis and advanced language models, we are aiming to transform input media into curated recommendations in a different format. For example, a use-case of the model is that a user would give a list of 5 songs, and get 5 books recommended.
The input media will be evaluated based on metadata such as the genre and reviews. The recommendations will be calculated using GPT-4. The recommendations will persist in a database so that they can be retrieved at a later time.
The API aims to provide seamless cross-media recommendations by utilizing metadata analysis and advanced language models, while also ensuring the availability of recommendations through persistent storage in a database.
1. Input 2. Get book description 3. Get a word embedding 4. Find similar vectors in DB 5. Output
Data Collection
Picked 3,000 songs from Spotify’s popular tunes in multiple genres
Fetch lyrics through Genius API
Get word embeddings by sentence-transformer
Insert the word embeddings and metadata (title, artist) in a vector DB
Database
We chose Pinecone as our vector DB since:
They offer a free plan/computing credits
They provide nice APIs easy to integrate with python application
The community is very active and a plenty of resource for trouble-shooting are available