Say Hello To Microservices: Building a Music Discovery App

Documenting the lessons learnt from building our discovery system outlined in “Bit Of This, Bit Of That: Revisiting Search & Discovery”

Photo by Florencia Viadana on Unsplash

Disclaimer 1: This post is in continuation of the last post about building a music discovery platform based on our paper: Bit Of This, Bit Of That: Revisiting Search and Discovery

Disclaimer 2: The design choices mentioned in this article are made with a low-cost setup in mind. As a result, some of the design choices may not be the most straightforward.

Brief Summary

Genre-fluid music is any musical item (song or a playlist) which contains more than a single genre. Think Old Town Road. Or Linkin Park (Nu-Metal genre). Genre-fluid music has been gaining popularity over the last few decades. However, the search interfaces in music apps like Spotify and Apple Music are still designed for single-genre searches. Our paper proposes a platform to discover gene-fluid music through a combination of expressive search and user experience created around the core idea of genre-fluid search.

Part 1 of this series outlines the initial system architecture (version1) used to build this platform. In version1, we use a monolithic architecture with in-memory lookup objects as data sources, PostgreSQL for keyword search, Spotify ANNOY library for sparse genre-vector search, Gensim Word2vec for similarity-based search, and finally, package the whole system as a Python Flask application.

Previously designed version1 architecture. Monolithic in nature, with in-memory objects as the primary data source.


While the main strength of version1 design is its simplicity of implementation, there are quite a few shortcomings as well. These are as follows:

In-memory litter: Instead of aggregated data sources such as Redis or MongoDB, we have multiple in-memory objects, which give memory a disjointed look.

High Memory Consumption: Due to the in-memory objects that cannot be shared across multiple application instances, horizontal scaling becomes difficult.

Monolith Problems: The whole design set-up as a monolith makes the scaling challenges even worse.

Significant Search Time: Sequential search to ANNOY search trees increases search time, worsening the user experience.

Design Refactoring

The best way to refactor the design would be to consider the abovementioned limitations and make changes accordingly.

Reducing Memory Consumption

  • We can start by migrating the in-memory app data (entity data) to Redis. It adds a bit of serialisation/deserialisation overhead, but on the plus side, our in-memory data can now be shared across multiple app instances.
  • We can use the mmap mode for Spotify ANNOY search to reduce memory consumption further. It searches the search tree without loading it into memory, reducing memory consumption. The downside of this is relatively slower (but still okay) search times.

Breaking the Monolith

As part of breaking the monolith to make scaling more manageable, we can remove the vector search component from the main application and make it its own service—our own vector database.

We further divide the main application into two parts:

  • The component that contains the core search logic, which we call the Core Discovery Service
  • The public-facing web component that forwards requests to the core discovery service

This leaves us with three separate services: web, core discovery, and vector search.

Upgrading The Query Parser Module

We can upgrade the Query Parser Module by using Named Entity Recognition as its core component and moving it to the core discovery service from the Javascript side for the final piece of refactoring. The purpose of this module is to automatically identify and extract genres (Rock Blues playlists), their quantifiers (Blues playlists with a little Rock), and other related entities, such as artists, from the user-written query and pass it onto the search workflows.

With this upgrade, the user no longer needs to specify a search mode for their query explicitly. This module can intelligently decide if the query is to be categorised as a keyword or genre-based search and appropriately forward the request to PostgreSQL or search-related modules.

Version2 design architecture. Microservice-based.



This service, packaged as FastAPI, is the web layer of the system. It takes in the query from the browser and forwards the requests to the core discovery service endpoints. This way, we keep our core discovery service client-agnostic. It performs the following functions:

  • Input validation
  • Request Authentication
  • User input conversion to an appropriate payload as accepted by the discovery service.

We can also customise this layer to add support for multiple client-specific workflows, keeping the discovery service untouched. Also, we can horizontally scale it with ease and keep it behind a reverse proxy solution such as NGINX for even better performance.

Core Discovery

This is the main application layer of the system containing core search logic. This is also packaged as a FastAPI application, is not public-facing and only accepts requests from the web layer over HTTP protocol. It has the following data sources:

  • Genre sparse vectors, stored in memory.
  • Redis for the application data (entity lookups) needed for scoring and item detail population modules.
  • PostgreSQL For keyword-based searches

This service can be scaled by using multiple Uvicorn workers or packaging it in a Docker container and using something like Kubernetes to manage multiple Docker containers.

This is the last layer in our system containing code for vector search. It can be viewed as our custom-implemented vector database. This is also packaged as a FastAPI application accepting vector search requests from the core discovery service using HTTP protocol.

Each genre vector search spawns two processes to search dot and angular metric trees, combine their result, and return the results to the core discovery service. Spawning of separate processes parallelises the search workflow, cutting the search time in half. This layer can also be scaled using Docker or Uvicorn workers. The ANNOY vector trees can still be shared among multiple processes using mmap.

Search Workflow

The search workflow in version2 remains similar to the one discussed in version1.

  1. The web server forwards the user query to the core discovery service.
  2. The query parser module parses the query, builds a query payload and forwards it to the caching module.
  3. The caching module checks whether the query results have already been stored in Redis. In case of a cache hit, steps 4–6 are skipped, and the result set is returned to the browser. In case of a cache miss, the query payload is forwarded to the Candidate Aggregation module.
  4. The candidate aggregation module sends the query to the vector search service, which returns the candidates from the ANNOY search trees.
  5. The scoring module scores the candidates with respect to the query.
  6. The filtering module removes duplicate candidates with regard to the item name and primary artist composition.
  7. The detail population module finally populates the result set candidates.


The Good

Memory Consumption

We gain over 7 GB of application memory by transferring the in-memory app data to Redis. And around 3 GB memory by using mmap for ANNOY search, although it comes at the expense of some speed.


Now that we have broken down the monolith into the web, core discovery, and vector search services, this design version, version2, renders itself far better for horizontal scaling than the previous version, as we can scale the services independently.

The Bad

Incomplete Memory Cleanup

Since we need the genre vectors for the main scoring module, they must be in memory for the fastest possible retrieval. So, we are still left with some lookup data structures inside memory. This data, as before, cannot be shared with other instances of the core discovery service, making for a suboptimal horizontal scaling.

Redis Overhead

Redis does make our data shareable across instances, but not without some added overhead. And as we move some of our app data from in-memory to Redis, it becomes more evident.

  • First of these is the serialisation/deserialisation overhead. Compared to in-memory objects, Redis cannot store the data with the same freedom (no support for integer keys, nested objects), leading to Redis serialisation/deserialisation code all over the place.
  • Secondly, bulk retrievals can become time-consuming compared to in-memory lookups, especially when storing lists. For example, if we want to store vectors in Redis as lists, there is no way to make bulk calls similar to MGET.

Sub-Optimal Vector Search Service

The problem with our vector search service is that it is just like a vector database minus all the optimisations provided by the out-of-box solutions. Everything has plenty of scope for improvement, ranging from communication to serialisation/deserialisation protocols, from storage to search implementations.


We broke the monolith outlined in the design version1 into three smaller components. This new version enables smoother horizontal scaling, and the memory view seems much more aggregated than the previous version. The vector search service, however, appears as if it has been put together like the Frankenstein monster. Redis overhead is also something that needs to be addressed by replacing it with NoSQL storage. Another scope of improvement is aggregating the search/retrieval sources, including PostgreSQL, Redis, Spotify ANNOY search trees, Gensim word2vec indices, and the core discovery service in memory.

Until the next iteration.

Software Developer

My research interests include music information retrieval, recommendation systems and web.