{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Recommendation Systems" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "\n", "1. https://en.wikipedia.org/wiki/Recommender_system" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Intro\n", "\n", "A Recommendation System is as name suggests generate recommendations for of items for a product (Movies, Shopping goods, Courses etc.)\n", "\n", "\n", "Recommendation Systems \n", "\n", "1. help users find related content.\n", "2. help users explore new items.\n", "3. improve user decision making." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And Recommendation Systems for a product \n", "\n", "1. increases user engagements.\n", "2. learn more about costumers.\n", "3. change in user behaviour." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some types of recommendation systems\n", "\n", "- Collaborative Filtering\n", "- Content based filtering\n", "- Knowledge based\n", "- Hybrid Systems\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Content Based Filtering" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This method actually uses the descripion of item and profile of user preferences. best suited when we have certain information about the item like \n", "name, location, description, etc.\n", "\n", "We focus on the item's information and build a classifier for user specific likes and dislikes based on item's feature.\n", "\n", "A `user profile` is built to indicate the type of items that this user likes. Algorithm tries to give recommendation based on user's preferences like liked items in the past or examining items in the present. Various candidate items are compared with items previously rated by the user and best matching items are recommended.\n", "\n", "Information retrieval and intformation filtering plays important role here.\n", "\n", "\n", "- User profile needs two things -\n", " - A model with user preference.\n", " - A history of the user's interaction with recommender system.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some algorithms that are used in the process - \n", "\n", "1. Tf-IDf vectorizer, Word2Vec, BERT, GPT3 (Vector space representation)\n", "2. Bayesian Classifier, Cluster Analysis, Decision Trees and Neural Network to estimate probabilities that user is going to like the item." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pros\n", "\n", "- Unlike collaborative filtering, content-based filtering doesn’t need extensive data from other users to create recommendations. Once a user has searched on and browsed a few items and/or completed some purchases, a content-based filtering system can begin making relevant recommendations.\n", "- Content-based recommenders can be highly tailored to the user’s interests, including recommendations for niche items, because the method relies on matching the characteristics or attributes of a database object with the `user’s profile`. \n", "- Recommendations are transparent to the user. Highly relevant recommendations project a sense of openness to the user, bolstering their trust level in offered recommendations. Comparatively, with collaborative filtering, instances are more likely to occur where users don’t understand why they see specific recommendations. For example, let’s say a group of users who purchased an umbrella also happen to buy down puffer coats. A collaborative system may recommend down puffer coats to other users who bought umbrellas but are uninterested in and have never browsed or purchased that product.\n", "- You avoid the “cold start” problem. Collaborative filtering creates a potential cold start scenario when a new website or community has few new users and lacks user connections. Although content-based filtering needs some initial inputs from users to start making recommendations, the quality of early recommendations is generally better than a collaborative system that requires the addition and correlation of millions of data points before becoming optimized." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Cons\n", "\n", "- When the system is limited to recommending content of the same type as the user is already using, the value from the recommendation system is significantly less than when other content types from other services can be recommended.\n", " \n", " \n", " For example, recommending news articles based on news browsing is useful. Still, it would be much more useful when music, videos, products, discussions, etc., from different services, can be recommended based on news browsing. To overcome this, most content-based recommender systems now use some form of the hybrid system\n", "\n", "- Requires domain knowledge to hand-engineer features, Difficult ot expand interest of user." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Collaborative Filtering" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Collaborative filtering is based on the assumption that people who agreed in the past will agree in the future, and they will like similar kinds of items as they liked in the past.\n", "- This system generates recommendations using only information about rating profile for different users or items.\n", "- Algorithms to measure user similarity and item similarity - Nearest Neighbour, pearson correlation etc.\n", "\n", "Two types of ratings \n", "\n", "| explicit ratings | implicit ratings |\n", "|-|-|\n", "| explicitly mentioned by user | inferred from user's behaviour |\n", "| like/dislike | watchtime/ session time |\n", "| 1-5 | watchcount | " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pros\n", "\n", "Good thing with CF is that It doesn't need to understand the item itself. Just understand the pattern and get the ability to give accurate predictions.\n", "\n", "But problems are\n", "\n", "### Cons\n", "\n", "| Problem | Description |\n", "|-|-|\n", "| Cold Start | New users -> less data -> less accurate recommendations. |\n", "| Scalability | If we are creating for millions of users and products then computation power requirement is high. |\n", "| Sparsity | Not necessarily every item is rated, the a lot of missing data while creating the matrix. |\n", "| no Context Features | Features created have no context, don't contain domain knowledge in them. |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## comparison \n", "\n", "### b/w content based and collaborative filtering method\n", "\n", "\n", "\\begin{matrix}\n", " \\text{content based method uses similarity between items in an embedding space}\\\\\n", " \\downarrow \\\\\n", " \\text{does not need other users data}\\\\\n", " \\downarrow \\\\\n", " \\text{own data, own preferences, own recommendations}\n", "\\end{matrix}\n", "\n", "`domain knowledge is necessary`\n", "\n", "----------\n", "\n", "\\begin{matrix}\n", " \\text{collaborative filtering learns latent features}\\\\\n", " \\downarrow \\\\\n", " \\text{user-item similarities}\\\\\n", " \\downarrow \\\\\n", " \\text{in an embedding space}\n", "\\end{matrix}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Knowledge based\n", "\n", "Ask users for preferences and then use those inputs from users to genereate recommendations.\n", "\n", "### Pros\n", "\n", "- No interaction data needed.\n", "- High fidelity data because of user's self reporting.\n", "\n", "### Cons\n", "\n", "- Need user data.\n", "- User data privacy concerns." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pitfalls of Recommendation Systems\n", "\n", "* User space and item space is sparse\n", " - Only a few users are rating items.\n", " - Most users are rating a very small sets of items.\n", "\n", "* skews data\n", " - some items are making large population.\n", " - some users are most active on the items. (fitting too many features from only few users)\n", "\n", "`embeddings` : An embedding is a map from collection of items to some finite dimensional vector space." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hybrid Recommendation Systems" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```sql\n", "\n", " Recommendation\n", " Systems\n", " \n", " |\n", " V\n", " +-----------------------------------+--------------------------------------+\n", " | | |\n", " V V V\n", " \n", " Content-Based Collaborative Filtering Knowledge-Based\n", " | | |\n", " | | |\n", " | | |\n", " | | |\n", " | | |\n", " +-----------------------------------+--------------------------------------+\n", " |\n", " V\n", " Hybrid\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Content Based features can be \n", "\n", "- Structured\n", "\n", " - Genres\n", " - Themes\n", " - Actors/ Directors involved\n", " - Professional rating\n", "\n", "- Un-Structured\n", "\n", " - summary text\n", " - movie stills\n", " - trailer\n", " - professional reviews\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " \n", "Collaborative Filtering features can be \n", "\n", "- Structured\n", "\n", " - User ratings\n", " - User views\n", " - wishlist/ add to cart\n", " - purchase history\n", "\n", "\n", "- Un-Structured\n", "\n", " - reviews\n", " - answered questions\n", " - submitted photos and videos" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Knowledge based features can be\n", "\n", "- Structured\n", " \n", " - Demographic \n", " - location/ langugage\n", " - preferences\n", " \n", "- Un-Structured\n", " \n", " - User's about me text\n" ] } ], "metadata": { "interpreter": { "hash": "dba788e4a50ad11c3aca04f6a487ccbbf2decea49c956f88ab099965f16291a4" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.1" } }, "nbformat": 4, "nbformat_minor": 4 }