Vector-search | Bits And Music

Playlist2vec: DIY Autoscaler For Docker Swarm - 2

Wed, 01 Jan 2025 00:00:00 +0000

Disclaimer 1: This post continues the last post about building a playlist search and discovery application on a Raspberry Pi powered by the sequence-to-sequence model described in the post "Building Music Playlists Recommendation System."

Disclaimer 2: The design choices mentioned in this article are made keeping in mind a low-cost setup. As a result, some of the design choices may not be the most straightforward.

Disclaimer 3: You can explore the vector search application, playlist2vec, here: https://playlist2vec.com/. You can find the code for the demo application here.

Brief Summary

In the initial setup for our vector search application, we had a NodeJS (Express JS) web server and FastAPI microservices for autocomplete and vector search features. All three are deployed as docker containers for ease of installation and scalability. We use the USearch vector search library for vector search, and for autocomplete, we use a Python-based Directed Word Graph library called fast-autocomplete. The entire setup is behind Nginx, which acts as a reverse proxy for our setup.

Playlist2vec design architecture. Scalable microservice-based architecture with NodeJS webserver, FastAPI-based vector search and autocomplete APIs, and Nginx as a reverse proxy.

Limitations

Despite support for these within our setup, there was no actual HTTPS or scaling implementation. In this post, we will focus on adding scaling capability to our application using Docker Swarm.

Docker Swarm: From Containers To Services

Docker Swarm enables the deployment and management of multiple instances of applications, ensuring high availability and resilience. The key distinction between deploying an application in swarm mode and using conventional Docker Compose lies in the added abstraction of services.

Containers are units of deployment which have their runtime. They encapsulate an application and its dependencies, including libraries, binaries, and configuration files, into a single, lightweight package. Each container runs in isolation from others, sharing the host operating system's kernel but maintaining its filesystem, processes, and network stack.

On the other hand, services represent a higher-level abstraction that defines how a specific application or a set of related applications should run in a container orchestration platform like Docker Swarm or Kubernetes. A service specifies the desired state for a group of containers, including the number of replicas (instances) to run, the networking configuration, and the load-balancing strategy. When you create a service, the orchestration platform automatically manages the deployment and scaling of the underlying containers to meet the defined specifications.

Scaling Up The Setup

Before transitioning to a Docker Swarm setup, we take the following steps to incorporate scaling into our configuration:

Add an additional Raspberry Pi to our machine cluster, ensuring that both machines can communicate with each other.
Modify the vector search implementation to be memory-based, moving away from the previous MMAP-based. This change allows us to better understand the resource requirements associated with scaling.

Docker Swarm Config

Here’s a snippet of the docker-compose.yaml file for one of the services, autocomplete-service:

autocomplete-service:
    build: ./autocomplete-service
    image: ${REGISTRY_HOST}:${REGISTRY_PORT}/autocomplete-image:latest
    networks:
      - p2v-network
    env_file:
      - .env
    deploy:
      replicas: 2

When the docker stack is deployed, this configuration scales the autocomplete-service to run multiple instances (2 in this case). This setup enables the service to handle significantly more traffic compared to a single-instance configuration, enhancing its ability to manage increased load effectively.

Autoscaling

Autoscaling is a cloud computing feature that automatically adjusts a service's number of active instances based on current demand. This ensures optimal resource utilization, maintains performance, and minimizes costs by dynamically scaling resources up or down in response to varying workloads.

The core concept of an autoscaler is to define conditions that trigger scaling actions. Common criteria for scaling include CPU usage, memory consumption, or the number of requests indicative of traffic load.

An autoscaler can operate in two primary ways:

Event-Driven Scaling: Scaling actions are triggered by specific events, such as a sudden spike in traffic.
Polling-Based Scaling: A service continuously monitors a metric and initiates scaling actions when that metric crosses a defined threshold.

Docker Swarm does not natively support autoscaling capabilities, unlike Kubernetes, which offers robust autoscaling features. However, it is possible to implement a basic autoscaling solution within an existing Docker Swarm setup.

DIY Autoscaling

In this implementation, we focus on the number of requests within a specific timeframe as the primary condition for autoscaling. We use Nginx's access.log as our primary source of information for the requests logged.

Our approach employs a polling mechanism with a 15-second interval. A bash script runs every 15 seconds to:

Retrieve each endpoint's total number of requests within the last 15 seconds by using awk.
Determine the required number of replicas based on the request count based on our custom logic, implementing load-based scaling.
Execute the docker service scale command to adjust the number of replicas horizontally, effectively increasing or decreasing the number of service instances.

Here's a snippet of code as an example to get the total request count for an endpoint from the Nginx logs:

awk -v start="$(date --date='15 seconds ago' '+%d/%b/%Y:%H:%M:%S')" \
    -v end="$(date '+%d/%b/%Y:%H:%M:%S')" \
    '$4 >= "["start"]" && $4 <= "["end"]" && $7 ~ /\/populate/ {count++} \
    END {}' /var/log/nginx.log

To establish the relationship between the number of requests and the required replicas, we conduct load testing on our services using the load testing tool k6. By performing constant-rate arrival tests, we identify the maximum requests a single Docker instance can handle for each service on our specific hardware. This data informs our autoscaling setup, ensuring we can effectively manage resource allocation in response to varying traffic demands.

// Autoscaling Pseudocode

// Read request counts from the Nginx access log for each endpoint
request_counts = read_log_parser_output()

// Determine the required number of service replicas based on 
// request counts based on our custom logic and prior load testing
required_service_replicas_lookup = get_scale(request_counts)

// Execute scaling commands for each service
FOR EACH service, replicas IN required_service_replicas_lookup:
    scale_service(service, replicas)

Thoughts

The Good

Low-Cost Setup Which Works

This setup is Ideal for small to medium-scale projects due to its low resource requirements. It lacks the complexity of more advanced frameworks like Kubernetes, requires only Linux's awk and python installation and is simple enough to set up and deploy, making managing it easier.

Customizability

It offers greater control over the autoscaling logic, allowing for adjustments such as adding custom logic to monitor additional metrics, making the scaling logic more sophisticated or simply modifying the polling duration.

The Bad

Dependency on Load Testing

Given that our scaling setup uses predefined load thresholds, the primary limitation of our DIY setup is its dependence upon manual load testing to determine appropriate scaling thresholds.

Polling Limitations

Another limitation of our polling-based scaling setup is that it may miss traffic peaks since any decision on whether to scale or not can come only after a predefined duration of 15 seconds, leading to delayed scaling responses.

Clunky Setup

Given that the setup involves setting up a cron job every 15 seconds, setting up the correct path to the nginx logs, the autoscale scripts, etc., it can feel quite clunky compared to industry-standard autoscaling frameworks such as Kubernetes.

Limited Metrics

The setup only considers the number of incoming requests reading from nginx logs. It does not consider other vital metrics, such as CPU and memory usage, which would be valuable indicators when evaluating scaling needs. Libraries such as cAdvisor, which can get container health metrics such as CPU usage, memory, etc, can be added to this setup to get a complete picture before deciding to scale.

Conclusion

We added a simple (auto)scaling capability to our vector search application deployed on a cluster of Raspberry Pis. The setup is highly low-cost but has limitations, such as being prone to missing traffic peaks, requiring manual load testing before the setup, and having limited metrics under consideration for scaling. Adding a standardized auto scaler such as Kubernetes would be the next step.

Until the next iteration.

Playlist2vec: A Raspberry-Pi Powered Vector Search System - 1

Wed, 04 Dec 2024 00:00:00 +0000

Disclaimer 1: The design choices mentioned in this article are made keeping in mind a low-cost setup. As a result, some of the design choices may not be the most straightforward ones.

Disclaimer 2: You can explore the vector search application, playlist2vec, here: https://playlist2vec.com/. You can find the code for the demo application here.

Introduction

In 2019, we published a paper titled "Representation, Exploration, and Recommendation of Music Playlists." In this work, we utilized sequence-to-sequence models to create playlist embeddings, which can be employed for various downstream tasks like search and discovery. You can see these embeddings in action at playlist2vec.com. The purpose of this post is to explain how we built Playlist2Vec, a playlist search application powered by the embeddings mentioned earlier.

Main Features

The main features of the app are:

A search box with typeahead search where users can enter the item's name they are looking for.
After selecting their preferred playlist name and submitting it, the system will display playlists similar to the one queried.
The app provides Spotify URLs for the items, allowing users to navigate to them from the results page easily.

Design Considerations

The typeahead search must be instantaneous.
The vector search should be capable of completing in under 2 seconds on a low-cost machine, like a Raspberry Pi, under normal traffic load.
Both vector and full-text search should be deployable onto a single machine with 4GB of RAM.

Developer Friendly Outline

We primarily wanted a setup with a lower footprint but still scalable if needed. So, we designed our system using a microservice architecture so that the system components are decoupled from each other and can be horizontally scaled independently. With that in consideration, our tech stack looks like this for the app:

NodeJS (ExpressJS) webserver
FastAPI for building two of our APIs; one is the search API for vector search, and the second is the autocomplete API for the typeahead search.
USearch Vector Search Library for vector search. Given a query playlist, similar playlists are found from a corpus of 745,543 playlists using vector search. This particular library was chosen because of its speed and mmap* support.
Fast Autocomplete Python library, a Directed Word Graph-based library for the typeahead search.
SQLite database to store additional details for playlists such as Spotify ID, name, and cover image link. This specific database is for its portability and a smaller footprint.
Docker containers to run the APIs and the webserver to facilitate horizontal scaleup
Nginx, as a reverse proxy for our setup, so that features such as caching, rate limiting, etc., do not have to be baked into the code. Configured to be installed on the host machine instead of running as a docker container**.

Overall System Design

Here's how the workflow looks like:

Playlist2vec design architecture. Scalable microservice-based architecture with NodeJS webserver, FastAPI-based vector search and autocomplete APIs, and Nginx as a reverse proxy.

When you begin typing the item you want to search for, Nginx will return a cached response if one is available.
If there is no cached response, the request is forwarded to the web server, which then sends it to the autocomplete API. The API returns a list of suggested item names and their corresponding IDs.
Once the user selects an item from the list, the ID is sent to the web server, which forwards it to the search API. The USearch vector index retrieves the k-closest results.
Additional details about these closest results, such as the playlist name, ID, and the cover image link, are then fetched from the SQLite database and returned to the browser.

Thoughts

The Good

Scalable architecture

Using Docker containers to hold our system modules (web server, search, and autocomplete APIs) makes it easy to scale the setup if needed.

Memory friendly setup

We chose SQLite as our database and USearch as our vector search library due to their lower memory footprint and MMAP-based implementation. Since this system is read-only, we do not require the concurrency features offered by enterprise databases like MySQL or PostgreSQL. Regarding vector search, the mmap support allows us to avoid loading the vector search index into memory, which helps conserve system RAM. While the performance may not match memory-based alternatives, it is sufficient for our needs.

Robust traffic support by using Nginx

Using Nginx enables robust support for traffic management, whether it is rate limiting to prevent any DDOS attacks (or even any volume of traffic which are beyond what our application can handle), caching (to have efficient utilization of system resources), or the rendering of static resources such as images, CSS, JS files, etc.

The Bad

Autocomplete Memory Consumption

The memory consumption of the autocomplete API can be pretty high under heavy load. Considering mmap-based alternatives may be beneficial in this case.

No HTTPS support

The v1.0.0 setup doesn't support HTTPS, so the setup requires something like a Cloudflare tunnel for HTTPS support.

Additionally, there is no built-in HTTPS support for the search and autocomplete APIs. The Node.js web server communicates directly with the API containers without a proxy in place to manage SSL or other networking rules. This means that all traffic between the web server and the APIs is unencrypted.

No (Auto)scaling (Yet)

Although the application has been designed to support scaling, the current version, v.1.0.0, still requires a scaling configuration, either Kubernates-based or another approach.

Conclusion

We designed a playlist search application powered by the embeddings from the sequence-to-sequence model we described in our paper. The setup is low-cost, enabling it to be deployed on a Raspberry Pi while still being designed to be scalable if needed. This version does depend on an HTTPS frontend (such as a Cloudflare tunnel) and does not yet have any scaling configuration.

Until the next iteration.

Notes

*Memory-mapped I/O (mmap) is a technique that allows a file or a portion of a file to be directly mapped into the memory address space of a process. This enables applications to access the file's contents as though they were part of the program's memory, facilitating efficient file input/output (I/O) operations.

**While Nginx could also have been installed as a docker container, we decided to run it on the host machine itself so as not to depend on the docker running itself, which can be used to show a maintenance page during any docker upgrades.

Next Stop, Vector Databases: Building a Music Discovery App - 3

Mon, 05 Feb 2024 00:00:00 +0000

Disclaimer 1: This is the third instalment in the How Not to Build a Music Discovery App series, based on our paper titled Bit of This, Bit Of That: Revisiting Search and Discovery. In Part 1, we present the initial monolithic version, and in Part 2, we discuss the transition to a microservice-based architecture.

Disclaimer 2: The design choices mentioned in this article are made with a low-cost setup in mind. As a result, some of the design choices may not be the most straightforward.

Disclaimer 3 You can explore our music discovery platform, This & That Music, here: https://discover.thisandthatmusic.com/.

Brief Summary

Genre-fluid music is any musical item (song or a playlist) which contains more than a single genre. Think Stressed Out by Twenty-One Pilots. Or Peaches En Regalia by Frank Zappa. Genre-fluid music has been gaining popularity over the last few decades. However, the search interfaces in music apps like Spotify and Apple Music are still designed for single-genre searches. Our paper proposes a platform to discover gene-fluid music through a combination of expressive search and user experience created around the core idea of genre-fluid search.

Part 1 of this series outlines the initial monolithic architecture used to build this platform. In Part 2, we break the monolith into three components: web server, discovery engine, and vector search server, using PostgreSQL for keyword search, Spotify ANNOY library for sparse genre-vector search, Gensim Word2vec for similarity-based search, and Redis as the cache storage.

Previously designed microservice-based architecture (version2).

Limitations

In the design version version2, we broke the monolithic architecture into smaller components to simplify horizontal scaling. However, there were some accompanying design issues as well. These are as follows:

Too Many Search & Lookup Sources

Our search/lookup source schema looks like the following:

PostgreSQL for full-text search
Redis for storing cached and app data
Spotify ANNOY data structures for sparse vector search
Gensim library for dense vector search.
In-memory sparse genre-vectors needed by the scoring module

That seems a bit much. Having these many search/lookup sources adds to the complexity of managing all these, such as scaling complexity and performance tuning.

Sub-Optimal Vector Database Implementation

Our vector search component, which uses the Spotify ANNOY library and the Gensim library for vector search, packaged as a Fastapi service, is not the most optimized vector-search component.

Spotify ANNOY was already a mid-tier vector search library in terms of speed in 2023, and we used the mmap mode on top of that to increase the search time further.

In addition, we also have the same problem of using two different libraries for vector search where using one vector database would have sufficed.

Redis Overhead

Redis is a very convenient caching solution, but as the application data schema becomes more complex, we start to see code snippets like this everywhere.

all_integer_items = [int(item.decode('utf-8')) for item in all_cached_items]

We must cast the byte response to our required data type as Redis stores the data as string data type.

Additionally, Redis works best with flat storage (list, set, or dictionary) and is not designed to store nested data. This makes Redis a less-than-ideal data storage (or caching) candidate as the application data becomes more complicated.

Finally, since we know that making multiple calls vs making a single bulk call can make all the difference from a performance perspective, Redis also provides limited support in that context by only supporting bulk GET queries for dictionaries and not lists.

Design Changes

Based on the abovementioned limitations, we can make the following design changes:

Replacing Redis With MongoDB

In the design versions 1 and 2, we used Redis for caching and storing the application data. This worked fine until we ran into the problems of having to write additional serialization/deserialization code for storing and using the retrieved data from Redis, having no direct support for storing nested data, and limited bulk query capabilities.

We can solve all those problems by moving to a NoSQL solution, such as MongoDB*. By doing so, we get two advantages straight off the bat:

No more serialization deserialization overhead
No more data storage format restrictions. We can store our data as JSONs with support for nesting as well.

We also get one more benefit by migrating to MongoDB: With its new storage engine, WiredTiger, we can choose the amount of memory to be allocated to it, meaning we can control the performance-memory tradeoff by controlling the amount of memory allocated to MongoDB.

Using Elasticsearch For All-Things-Search

Instead of using two separate solutions (PostgreSQL and SpotifyANNOY/Gensim) for full-text search and vector search, we can use a service which supports both, such as Elasticsearch. Elasticsearch has existed for a long time as a distributed full-text search engine. However, it has also added the vector search functionality over the past few years. Using this as our out-of-box vector database gives us the benefit of having a well-established, stable, and highly optimized service for our search use cases, making our application design much cleaner and more optimized.

Version3 design architecture. Search source aggregation achieved by using Elasticsearch for full-text and vector search. MongoDB replaces Redis for caching and application data storage.

Search Workflow

The search workflow in version3 remains similar to that discussed in version2.

The web server forwards the user query to the core discovery service.
The query parser module parses the query, builds a query payload and forwards it to the caching module.
The caching module checks whether the query results have already been stored in MongoDB. In case of a cache hit, steps 4–6 are skipped, and the result set is returned to the browser. In case of a cache miss, the query payload is forwarded to the Candidate Aggregation module.
The candidate aggregation module sends the query to Elasticsearch for vector search, which returns 10k candidates.
The scoring module scores the candidates with respect to the query.
The filtering module removes duplicate candidates with regard to the item name and primary artist composition.
The detail population module finally populates the result set candidates.

Thoughts

The Good

Search Source Aggregation

What this setup succeeds in achieving is search source aggregation. We replace PostgreSQL, Spotify ANNOY, and Gensim entirely with Elasticsearch, making our design much cleaner and easier to manage in terms of infrastructure management, scaling, and performance tuning.

MongoDB Convenience

By using MongoDB in place of Redis for caching purposes and application data storage, we now have data stored in a JSON format that closely resembles how we use the data in our application. And we no longer need to cast the data back to their intended data types, resulting in a cleaner code. All of this comes with an option to specify the memory allocated to MongoDB, thus making this setup suitable for the hardware resources available.

The Bad

In-Complete In-Memory Cleanup

The in-memory cleanup remains incomplete, with the sparse genre vectors remaining in the memory. We can store those in MongoDB, but fetching genre vectors from MongoDB despite sufficient memory allocation would still result in an increased fetch time compared to in-memory.

10k Scoring Time

Since the beginning, one persistent issue with the scoring module has been that it takes longer than expected to calculate scores for 10,000 candidates. This results in an overall increase in search response time.

The Ugly

While Elasticsearch is better than Spotify ANNOY (mmap mode) in terms of search performance and convenience it provides by handling both vector search and full-text search, the memory consumption in this setup went past all our self-imposed restrictions. With a total index size of around 25GB, we must allocate a whole new 64GB server for search, which amounts to roughly $576 monthly for running a self-hosted single instance of Elasticsearch. Ouch!

Conclusion

We cleaned up the application design by using Elasticsearch for both full-text and vector search queries. And it came at the expense of memory consumption. We also replaced Redis with MongoDB for caching and data storage purposes, thus aligning the data storage format (JSON) with the usage format. We must reduce costs significantly as we advance while keeping the design cleaner. Another problem is the 10k scoring time problem, which needs to be substantially reduced. And lastly, the memory cleanup remains to be completed.

Until the next iteration.

_{* In place of MongoDB, we can also use RedisJSON, a document-based database similar to MongoDB that supports data storage and retrieval as JSON.}

Ready to explore genre-fluid music? Visit our music discovery platform, This & Thats Music, here: https://discover.thisandthatmusic.com/

Say Hello To Microservices: Building a Music Discovery App - 2

Mon, 15 Jan 2024 00:00:00 +0000

Disclaimer 1: This post is in continuation of the last post about building a music discovery platform based on our paper: Bit Of This, Bit Of That: Revisiting Search and Discovery

Disclaimer 2: The design choices mentioned in this article are made with a low-cost setup in mind. As a result, some of the design choices may not be the most straightforward.

Disclaimer 3 You can explore our music discovery platform, This & That Music, here: https://discover.thisandthatmusic.com/.

Brief Summary

Genre-fluid music is any musical item (song or a playlist) which contains more than a single genre. Think Old Town Road. Or Linkin Park (Nu-Metal genre). Genre-fluid music has been gaining popularity over the last few decades. However, the search interfaces in music apps like Spotify and Apple Music are still designed for single-genre searches. Our paper proposes a platform to discover gene-fluid music through a combination of expressive search and user experience created around the core idea of genre-fluid search.

Part 1 of this series outlines the initial system architecture (version1) used to build this platform. In version1, we use a monolithic architecture with in-memory lookup objects as data sources, PostgreSQL for keyword search, Spotify ANNOY library for sparse genre-vector search, Gensim Word2vec for similarity-based search, and finally, package the whole system as a Python Flask application.

Previously designed version1 architecture. Monolithic in nature, with in-memory objects as the primary data source.

Limitations

While the main strength of version1 design is its simplicity of implementation, there are quite a few shortcomings as well. These are as follows:

In-memory litter: Instead of aggregated data sources such as Redis or MongoDB, we have multiple in-memory objects, which give memory a disjointed look.

High Memory Consumption: Due to the in-memory objects that cannot be shared across multiple application instances, horizontal scaling becomes difficult.

Monolith Problems: The whole design set-up as a monolith makes the scaling challenges even worse.

Significant Search Time: Sequential search to ANNOY search trees increases search time, worsening the user experience.

Design Refactoring

The best way to refactor the design would be to consider the abovementioned limitations and make changes accordingly.

Reducing Memory Consumption

We can start by migrating the in-memory app data (entity data) to Redis. It adds a bit of serialisation/deserialisation overhead, but on the plus side, our in-memory data can now be shared across multiple app instances.
We can use the mmap mode for Spotify ANNOY search to reduce memory consumption further. It searches the search tree without loading it into memory, reducing memory consumption. The downside of this is relatively slower (but still okay) search times.

Breaking the Monolith

As part of breaking the monolith to make scaling more manageable, we can remove the vector search component from the main application and make it its own service—our own vector database.

We further divide the main application into two parts:

The component that contains the core search logic, which we call the Core Discovery Service
The public-facing web component that forwards requests to the core discovery service

This leaves us with three separate services: web, core discovery, and vector search.

Upgrading The Query Parser Module

We can upgrade the Query Parser Module by using Named Entity Recognition as its core component and moving it to the core discovery service from the Javascript side for the final piece of refactoring. The purpose of this module is to automatically identify and extract genres (Rock Blues playlists), their quantifiers (Blues playlists with a little Rock), and other related entities, such as artists, from the user-written query and pass it onto the search workflows.

With this upgrade, the user no longer needs to specify a search mode for their query explicitly. This module can intelligently decide if the query is to be categorised as a keyword or genre-based search and appropriately forward the request to PostgreSQL or search-related modules.

Version2 design architecture. Microservice-based.

Services

Web

This service, packaged as FastAPI, is the web layer of the system. It takes in the query from the browser and forwards the requests to the core discovery service endpoints. This way, we keep our core discovery service client-agnostic. It performs the following functions:

Input validation
Request Authentication
User input conversion to an appropriate payload as accepted by the discovery service.

We can also customise this layer to add support for multiple client-specific workflows, keeping the discovery service untouched. Also, we can horizontally scale it with ease and keep it behind a reverse proxy solution such as NGINX for even better performance.

Core Discovery

This is the main application layer of the system containing core search logic. This is also packaged as a FastAPI application, is not public-facing and only accepts requests from the web layer over HTTP protocol. It has the following data sources:

Genre sparse vectors, stored in memory.
Redis for the application data (entity lookups) needed for scoring and item detail population modules.
PostgreSQL For keyword-based searches

This service can be scaled by using multiple Uvicorn workers or packaging it in a Docker container and using something like Kubernetes to manage multiple Docker containers.

Vector Search

This is the last layer in our system containing code for vector search. It can be viewed as our custom-implemented vector database. This is also packaged as a FastAPI application accepting vector search requests from the core discovery service using HTTP protocol.

Each genre vector search spawns two processes to search dot and angular metric trees, combine their result, and return the results to the core discovery service. Spawning of separate processes parallelises the search workflow, cutting the search time in half. This layer can also be scaled using Docker or Uvicorn workers. The ANNOY vector trees can still be shared among multiple processes using mmap.

Search Workflow

The search workflow in version2 remains similar to the one discussed in version1.

The web server forwards the user query to the core discovery service.
The query parser module parses the query, builds a query payload and forwards it to the caching module.
The caching module checks whether the query results have already been stored in Redis. In case of a cache hit, steps 4–6 are skipped, and the result set is returned to the browser. In case of a cache miss, the query payload is forwarded to the Candidate Aggregation module.
The candidate aggregation module sends the query to the vector search service, which returns the candidates from the ANNOY search trees.
The scoring module scores the candidates with respect to the query.
The filtering module removes duplicate candidates with regard to the item name and primary artist composition.
The detail population module finally populates the result set candidates.

Thoughts

The Good

Memory Consumption

We gain over 7 GB of application memory by transferring the in-memory app data to Redis. And around 3 GB memory by using mmap for ANNOY search, although it comes at the expense of some speed.

Scalability

Now that we have broken down the monolith into the web, core discovery, and vector search services, this design version, version2, renders itself far better for horizontal scaling than the previous version, as we can scale the services independently.

The Bad

Incomplete Memory Cleanup

Since we need the genre vectors for the main scoring module, they must be in memory for the fastest possible retrieval. So, we are still left with some lookup data structures inside memory. This data, as before, cannot be shared with other instances of the core discovery service, making for a suboptimal horizontal scaling.

Redis Overhead

Redis does make our data shareable across instances, but not without some added overhead. And as we move some of our app data from in-memory to Redis, it becomes more evident.

First of these is the serialisation/deserialisation overhead. Compared to in-memory objects, Redis cannot store the data with the same freedom (no support for integer keys, nested objects), leading to Redis serialisation/deserialisation code all over the place.
Secondly, bulk retrievals can become time-consuming compared to in-memory lookups, especially when storing lists. For example, if we want to store vectors in Redis as lists, there is no way to make bulk calls similar to MGET.

Sub-Optimal Vector Search Service

The problem with our vector search service is that it is just like a vector database minus all the optimisations provided by the out-of-box solutions. Everything has plenty of scope for improvement, ranging from communication to serialisation/deserialisation protocols, from storage to search implementations.

Conclusion

We broke the monolith outlined in the design version1 into three smaller components. This new version enables smoother horizontal scaling, and the memory view seems much more aggregated than the previous version. The vector search service, however, appears as if it has been put together like the Frankenstein monster. Redis overhead is also something that needs to be addressed by replacing it with NoSQL storage. Another scope of improvement is aggregating the search/retrieval sources, including PostgreSQL, Redis, Spotify ANNOY search trees, Gensim word2vec indices, and the core discovery service in memory.

Until the next iteration.

Ready to explore genre-fluid music? Visit our music discovery platform, This & Thats Music, here: https://discover.thisandthatmusic.com/

Of Monoliths & In-Memory Litter: Building A Music Discovery App - 1

Wed, 10 Jan 2024 00:00:00 +0000

Disclaimer 1: This post is the first post in the series about building a music discovery platform based on our paper: Bit Of This, Bit Of That: Revisiting Search and Discovery.

Disclaimer 2: The design choices mentioned in this article are made keeping in mind a low-cost setup. As a result, some of the design choices may not be the most straightforward ones.

Disclaimer 3 You can explore our music discovery platform, This & That Music, here: https://discover.thisandthatmusic.com/.

The term genre-fluid music sounds so fancy and sophisticated and something you aren't supposed to know, yet you do. Even if you didn't realize you did, you do. After all, it's all around us. Think Old Town Road by Lil Nas X. How would you describe its genre? Is it country, or is it hip-hop? Or is it both? And that's what genre fluidity is: any musical item consisting of more than a single genre.

Over the past few decades, genre-fluidity has been gaining prominence in the billboard charts. However, search interfaces have yet to keep up with this trend and continue to be designed for single-genre searches, while genre fluidity is served mostly via recommendations and curated playlists. This may soon change as recent advancements in AI, specifically Large Language Models (LLMs), offer more expansive and expressive access to content.

So why are we talking about genre fluidity and old-town roads? A few years ago, we published a paper proposing a platform to discover genre-fluid music through a combination of expressive search and a user experience created around the core idea of genre-fluid search. This post explains the initial architecture we used to build that discovery system and the lessons we learned.

Search

Our proposed system enables users to search for music by combining different genres and specifying the proportion of each genre they want in the results. We call it genre-fluid search. Adopting the design principles of existing apps, we keep the existing keyword search ("Adele playlists") along with our proposed search to keep the cognitive load as low as possible.

System search modes: Keyword search mode is the conventional search mode, while genre search mode is the proposed one.

Data Summary

Before going into the depth of the architecture, here is a summary of the data used for creating the system:

There are a total of 88 genres supported by the system.
There are three entity types: artists, playlists, and tracks.
The system has around one million playlists, seven million tracks, and a million artists.
Each entity has demographic data, such as name, popularity, preview audio link, and cover image.

Embeddings

We use neural embeddings for our search and similarity-based recommendation purposes. There are two types of embeddings in our system:

Genre-based embeddings
Similarity-based embeddings

To get the genre-based embedding for each entity, we annotate entities with their associated genres so each entity can be represented as an 88-length sparse vector. This vector consists of real values, meaning the nonzero values range between 0 and 1. In addition to entity-genre annotation, we make use of genre relationships and have those relationships reflected in the entity genre annotation.

Similarity-based embeddings are obtained by training word2vec models on our corpus, treating playlists as sentences and tracks (and artists) as words. These embeddings are 300-length vectors, with values ranging between 0 and 1. Finally, to perform Approximate Nearest Neighbor search on our embeddings, we use the Spotify ANNOY library and store the indexes for each entity on disk.

Data Structures And Storage

The main consideration for our search workflow is retrieval as fast as possible, which means it should be memory-based instead of disk or mmap-based. We use the Python pickle module to store our entity details objects as a dictionary, with entity-id being our key and entity details stored in an array ([name, preview audio link, cover image link]). Popularity lookups are stored in the same format as well. We store the sparse genre embeddings as H5 files. Word2vec and Spotify ANNOY data structures are saved as their corresponding supported files.

System Modules

The main components of this system, packaged as a Python Flask application, are:

Query Parser Module
Candidate Aggregation Module
Scoring Module
Filtering Module
Detail Population Module
Caching Module

Query Parser Module

This module converts the search query typed in natural language ("rock with playlists some country") into an 88-genre vector. We apply the genre-relationship rules mentioned in the embeddings section to the query vector.

Candidate Aggregation Module

Even though we create a tree data structure built with binary space partitioning using the Spotify ANNOY library from our genre embeddings, we use our scoring module instead of the conventional query-to-candidate distance as the final scoring metric. We create two indexes per entity type for genre embeddings to get the maximum possible candidates for a given query: one using the Angular distance and another using the Dot Product distance. We sequentially query the indexes and retrieve N=10,000 candidates from the indexes using the candidate aggregator module.

Scoring Module

We use our custom scoring function to calculate a score for each of those 10,000 candidates, where we consider things such as the candidate values corresponding to each query genre, missing genres in a candidate, additional candidate genres not specified in the query, and candidate popularity.

Filtering Module

Since our corpus contains user-created playlists, it is possible to have multiple playlists with the same name. Also, many playlists for a given search query can mainly comprise the same artist. To avoid returning duplicate results in their name or primary artist composition, we have a separate module to filter out such items and finally return K=300 result items sorted by their score.

Detail Population Module

This is where we get details for each result item and prepare the final paginated result set to be sent back to the client.

Caching Module

Since our system serves results without considering user personalization, each query should get the same response. With this in mind, we can cache our query results using Redis, as it is the simplest and the most common solution. We serialize each query as a string and store the indexes returned by the filtering module as the value for each query. Only the query and detail population modules get called for repeated searches, bypassing the rest.

Overall System Design

System Design Architecture `version1`. Monolithic. Simple.

Here's how the application design looks, zoomed out. The query "Rock playlists with blues" is first entered in a search box in a web application.

The query parser module parses this text and converts it into an 88-length vector.
The query vector is sent to the server, where the caching module checks whether the query results have already been stored in Redis. In case of a cache hit, steps 3-5 are skipped, and the result set is returned to the browser. In case of a cache miss, the query vector is forwarded to the Candidate Aggregator module.
The candidate aggregator module uses this query vector to query the Spotify ANNOY indices, retrieves 10000 candidates, and passes it onto the scorer module.
The scorer module runs our custom scoring function to calculate scores for each of those 10000 candidates.
The filtering module removes duplicate candidates by name and primary artist. If the resulting set has items under a certain threshold K, it adds the items removed in the filtering stage.
Finally, the demographic data is retrieved for the first k=30 candidates to be returned to the browser.
Candidates calculated are stored in Redis corresponding to the search query.
In order to support existing keyword-based searches, we use the PostgreSQL database as it supports full-text search from disk with a decent performance.

Thoughts

The Good

The best thing about this setup is its simplicity. Its monolithic design is as simple as possible, adding little coding complexity overhead. The straightforward nature of this design makes it an ideal choice for prototyping. The design is also modular, making it easier to move away from monolithic architecture in the future. Lastly, using PostgreSQL over other full-text solutions, such as Elasticsearch, works fine without needing the memory most memory-based full-text search solutions need.

The Bad

In-Memory Litter

Using pickle module for serializing data to disk is excellent for entry-level applications and prototypes. They are even more efficient in data storage than other candidates, such as JSON. However, there are better options than pickle when it comes to interoperability, with JSON being one of them. Additionally, moving across different Python versions can also lead to problems.

Another problem that having in-memory objects creates is high memory consumption. While data access is high-speed and straightforward, it also makes deploying the application more complex, as the memory needed for a single instance of the application equals the size of these in-memory objects. This data tied to an instance's memory cannot be shared with other application instances.

Search Time

Owing to our sequential search for Spotify ANNOY indices and the scoring function being run over 10k candidates, the search time is well over 5 seconds for a new query, which is far over the ideal response time.

Caching At A Cost

Caching using Redis is great until it is not. It usually starts with simpler string serialized keys and values, which later becomes a whole thing of writing logic to serialize and deserialize data.

Monolith

The monolithic architecture has notable drawbacks, such as the need to manage, test, and deploy all components as one unit. Additionally, the entire system is bound to a single programming language, which may become less than ideal as the application expands and becomes more complex.

The Ugly

Deploying an application that requires 32GB of memory presents a challenge due to the complexity and high cost of horizontal scaling. For instance, the monthly cost of an AWS t4g.2xlarge instance, which could support such an application, is approximately $180.

Conclusion

The design mentioned above is a decent place to start as it lets the main application idea take focus, owing to its simplicity. Its limiting factors, however, are quite a few and critical as well. Monolith is all well and good up to a certain point, beyond which it stops making sense. In-memory objects lead to substantial memory consumption, complicating the scaling and deployment processes. As we aim to improve this design, our approach would involve reducing memory usage and transitioning from a monolithic structure to a more modular, component-based design.

Until the next design.

Ready to explore genre-fluid music? Visit our music discovery platform, This & Thats Music, here: https://discover.thisandthatmusic.com/

Building Music Playlists Recommendation System

Mon, 09 Sep 2019 00:00:00 +0000

Quick Summary

Goal

The goal of this work is to represent playlists in a way which captures the true essence of the playlist, i.e. information such as type, genre, variety, order, and the number of songs in the playlist, and which can be used for tasks such as playlist discovery and recommendation.

Contribution

Built a recommendation engine for playlists using sequence-2-sequence learning.
Evaluated the work using a recommendation-based evaluation task.
Assembled a dataset of 1 million Spotify playlists and 13 million tracks for this work.

Applications

Playlist Discovery/Recommendation Engine

Our system can be used for playlist discovery and recommendation. Given a query playlist, the system returns the playlists from the database which are most similar to the input playlist.

Image displaying the usage for our playlist recommendation engine. The system returns playlists most similar to the query playlist

Developer Friendly Outline

Here’s a quick outline of our proposed approach:

Download playlists data using Spotify developer API and everynoise.com.
Filter the data by removing noise (rare songs, duplicate songs, outlier sized playlists, etc.)
Train a sequence-2-sequence⁹ model over the data to learn playlist embeddings.
Annotate the data (songs and playlists) for genre information.
Evaluate the embeddings using our proposed evaluation tasks.
Build a recommendation engine by populating a KD-tree with the learned playlist embeddings, and retrieving search results by utilizing the nearest neighbor approach.

Introduction

Playlists have become a significant part of our music listening experience today. There are over three billion of these on Spotify alone¹. There are playlists for every moment, every mood, every season, and so on. With millions of songs at their fingertips, users today have grown accustomed² to:

Immediate attainment of their music demands.
An extended experience. While recommendation engines service the first aspect, playlists handle the second aspect of this changing behavior, making playlist recommendation extremely important, both for the users and music companies.

What is a Playlist, and Why Should I Care?

“A playlist is a set of songs supposed to be listened together, usually in an explicit order.” ³

Playlists are extremely important today from the perspective of both users and music researchers. From the user perspective, playlists are an effective way to discover new music and artists. From the researcher perspective, it is important to understand that music is consumed through listening and playlists formalize that listening experience³. Playlists are a unit component which can be discovered and recommended, just like artists, songs and albums.

”From the researcher perspective, it is important to understand that music is consumed through listening and playlists formalize that listening experience” ³

So What’s the problem?

As mentioned before, owing to the meteoric rise in the usage of playlists, playlist recommendation is crucial to music services today. However, over the past couple of years, from a research perspective, playlist recommendation has become analogous to playlist prediction/creation⁷ ⁸ and continuation⁵ ⁶ rather than playlist discovery. However, playlist discovery forms a significant part of the overall playlist recommendation pipeline, as it is an effective way to help users discover existing playlists on the platform.

Discovery is the name of the Game

Our work aims to represent playlists in a way which can be used to discover and recommend existing playlists. We use sequence-to-sequence learning⁹ to learn embeddings for playlists that capture their semantic meaning without any supervision. These fixed-length embeddings can then be used for recommendation purposes.

Intuition Behind the Approach

Why Sequence-to-sequence Learning?

The primary intuition behind choosing sequence-to-sequence learning is that playlists can be interpreted just as sentences, and songs as words in a sentence. In the past few years, sequence-to-sequence learning has been widely used to learn effective sentence embeddings in applications like neural machine translation¹⁰. We make use of the relationship playlist:songs:: sentences:words, and take inspiration from research in the field of natural language processing to model playlist embeddings the way sentences are embedded.

We make use of the relationship playlist:songs :: sentences: words, and take inspiration from research in natural language processing to model playlist embeddings the way sentences are embedded.

Sequence-to-Sequence Learning

The name sequence-to-sequence learning in its very core implies that the network is trained to take in sequences and output sequences. So instead of predicting single word, the network outputs the entire sentence, which could be a translated in a foreign language, or the next predicted sentence from the corpus, or even the same sentence if the network is trained like an autoencoder.

For this work, we use seq2seq framework as an autoencoder where the task of the network is to reconstruct the input playlist and in doing so, learn a compact representation of the input playlist, which captures the properties of the playlist.

The overall concept of using seq2seq network like an autoencoder.

Seq2seq Models

We use the Attention technique for the seq2seq models used in this work to learn the playlist embeddings which capture the long-term dependencies between the songs in the playlist because of the relatively longer length of playlists (50–1000 songs). We experiment with 2 variants of seq2seq models:

Unidirectional seq2seq networks
Bidirectional seq2seq networks

A bidirectional seq2seq network is different from the unidirectional variant in the sense that a bidirectional RNN is used, meaning the hidden state is the concatenation of a forward RNN and a backward RNN that read the sequences in two opposite directions. This allows the network to capture more contextual information for the decoder to predict the output symbol.

Data: Curation, Filtering, and Annotation

Need

For this work, we need a list of playlists (sentences) with each playlist consisting of a list of songs (words). For solving this problem in it’s simplest form, we need just the playlist IDs and the song IDs mapped with the appropriate playlist IDs.

Dataset Creation

We download the data using the Spotify Web API. To collect a big enough set of terms to query the Spotify system, we use everynoise.com. This interactive website contains a list of some 2600+ genres, graphed out according to their relationship with each other, along with an audio example for each genre. We parse the data from the home page of this website and get the list of all the genres. Then for each genre, we download playlists (along with the corresponding song information) using the Spotify Web API. The whole flow is shown in Fig 1.

Fig 1: Data download workflow

Here are the downloaded data details:

1 Million Playlists
3 Million Artists.
13 Million Tracks
3 Million Unique Tracks
3 Million Albums
2680 Genres

Data Filtering

We follow [1] in doing the data clean up by removing the rare tracks, and outlier sized playlists (having a number of songs less than 10 or greater than 5000). This leaves up with 755k unique playlists and 2.4 million unique tracks.

Annotation

Image of a subset of genres taken from everynoise.com

Problem: Genres, Genres Everywhere

Although this step is not directly needed for the training part, however, it is crucial for the evaluation phase. This step aims to label the playlists with their appropriate genre. There are certain problems with the information available so far:

Spotify provides genre information for the artist, but not the song. Labeling the song genre same as the artist genre would not be entirely correct as an artist can have songs of different genres.
Assigning the songs the same genre as that of the playlist, which in turn is derived from the query term for which it was fetched, would be problematic. The issue in going this route is the subjectivity associated with it. Provided this fine-grained annotation, what would be the difference between soft rock, 80’s rock, classic rock, and rock from the perspective of classification?

Hence, we need to bring down the number of genres (output labels) from 2680 to a more manageable number.

Proposed Solution

Data annotation workflow

To solve for this, we train a word2vec¹¹ model on our corpus to get song embeddings, which capture the semantic characteristics (such as genre) of the songs by their co-occurrence in the corpus.
The resulting song embeddings are then clustered into 200 clusters (arbitrarily chosen number in an attempt to maintain the balance between the feasibility of the annotation process and size of formed clusters. Smaller cluster size and lesser annotation time are desired).
For each cluster:

• Artist genre is applied to each corresponding song and a genre-frequency (count) dictionary is created. A sample genre-count dictionary for cluster with 17 songs would look like {rock: 5, indie-rock:3, blues: 2,soft-rock: 7}

• From this dictionary, the genre having a clear majority is assigned as the genre for all the songs in that cluster.

• All the songs in a cluster with no clear genre majority are discarded for annotation.

Based on the observed genre-distribution in the data, and as a result of clustering sub-genres (such as soft-rock) into parent genres (such as rock), the genres finally are chosen for annotating the clusters are:

Rock, Metal, Blues, Country, Reggae, Latin, Electronic, Hip Hop Classical.

To validate our approach, we train a classifier on our dataset consisting of annotated song embeddings. With training and test set kept separate at the time of training, we achieve a 94% test accuracy.

t-SNE plot for genre-annotated songs, with 1000 sampled songs for each genre

For playlist-genre annotation, only the playlists having all the songs annotated, are considered for annotation. Further, only those playlists are assigned genres for which more than 70% of the songs agree on a genre.

Evaluation

Since the aim of our work is to learn playlist embeddings which can be used for recommendation, we evaluate the quality of embeddings using a recommendation task.

Recommendation Task

The recommendation being inherently subjective in nature is best evaluated by having user-labeled data. However, in the absence of such annotated datasets, we evaluate our proposed approach by measuring the extent to which the playlist space created by the embedding models is relevant, in terms of the similarity of genre and length information of closely-lying playlists. We use the Approximate Nearest Neighbors Algorithm using Spotify ANNOY library¹³ to populate the tree structure with the playlist embeddings. A query playlist is randomly selected and the search results are compared with the queried playlist in terms of genre and length information. There are nine possible genre labels. For comparing length, ten output classes (spanning the range {30…250} corresponding to bins of size 20 are created. An average of 100 precision values for each query is considered.

Baseline Comparison

To evaluate the performance of our proposed technique, we need some sort of baseline performance as well. As our baseline model, we experiment with a weighted variant of Bag-of-words model¹⁴, which uses a weighted averaging scheme to get the sentence embedding vectors followed by their modification using singular-value decomposition (SVD). This method of generating sentence embeddings proves to be a stronger baseline compared to traditional averaging.

Results

The Recommendation task, as shown in Figure below, captures some interesting insights about the effectiveness of different models for capturing different characteristics. Firstly, high precision values demonstrate the relevance of the playlist embedding space which is the first and foremost expectation from a recommendation system. Also, BoW models capture genre information better than seq2seq models, while length information is better captured by the seq2seq models, demonstrating the suitability of different models for different tasks.

Applications

One of the direct applications of this work is a recommendation engine for playlists. Given a query, the system would recommend/retrieve similar playlists form the corpus. The tree data structure discussed in Recommendation Task section can be directly used for this purpose. Given a query playlist, its k-nearest neighbors would be the most similar items to it and would be the system recommendations. Demonstration for our work can be seen in the video.

Recommendation System Demo video

And that’s that. We have presented a seq2seq based approach for learning playlist embeddings, which can be used for tasks such as playlist discovery and recommendation. Our approach can also be extended for learning even better playlist-representations by integrating content-based (lyrics, audio, etc.) song-embedding models, and for generating new playlists by using variational sequence models. In the paper, there are many more evaluation techniques for assessing the quality of playlist embeddings with respect to the encoded information, which is out of scope for this post. I will be discussing that in another post. Until next time!

P.S — Here’s the link to the paper.

References

_{[1] https://newsroom.spotify.com/2018-10-10/celebrating-a-decade-of-discovery-on-spotify/}
_{[2] Keunwoo Choi, George Fazekas, and Mark Sandler. Towards playlist generation algorithms using rnns trained on within track transitions.arXiv preprint arXiv:1606.02096, 2016}
_{[3] Fields, Ben, and Paul Lamere. “Finding A Path Through The Jukebox — The Playlist Tutorial, ISMIR.” ISMIR, Utrecht (2010).}
_{[4] De Mooij, A. M., and W. F. J. Verhaegh. “Learning preferences for music playlists.” Artificial Intelligence 97.1–2 (1997): 245–271.}
_{[5] Ching-Wei Chen, Paul Lamere, Markus Schedl, and Hamed Zamani. Recsys challenge 2018: Automatic music playlist continuation. InProceedings of the 12th ACM Conference on Recommender Systems, pages 527–528.ACM, 2018}
_{[6] Maksims Volkovs, Himanshu Rai, Zhaoyue Cheng, Ga Wu, Yichao Lu, and Scott Sanner. Two-stage model for automatic playlist continuation at scale. In Proceedings of the ACM Recommender Systems Challenge 2018, page 9. ACM, 2018}
_{[7] Andreja Andric and Goffredo Haus. Automatic playlist generation based on tracking user’s listening habits.Multimedia Tools and Applications, 29(2):127–151, 2006.}
_{[8] Beth Logan. Content-based playlist generation: Exploratory experiments. InISMIR, 2002.}
_{[9] lya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. Advances in neural information processing systems, pages 3104–3112, 2014.}
_{[10] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate.arXiv preprint arXiv:1409.0473, 2014.}
_{[11] Mikolov, Tomas, et al. “Efficient estimation of word representations in vector space.” arXiv preprint arXiv:1301.3781 (2013).}
_{[12] Anita Shen Lillie.MusicBox: Navigating the space of your music. PhD thesis, Massachusetts Institute of Technology, 2008}
_{[13] Bernhardsson, E. “ANNOY: Approximate nearest neighbors in C++/Python optimized for memory usage and loading/saving to disk.” GitHub https://github. com/spotify/annoy (2017).}
_{[14] Arora, Sanjeev, Yingyu Liang, and Tengyu Ma. “A simple but tough-to-beat baseline for sentence embeddings.” (2016).}