This document summarizes key considerations for evaluating collaborative filtering recommender systems. It discusses the user tasks being evaluated, types of analysis and datasets used, ways to measure prediction quality and other attributes, and how to evaluate the overall system from the user perspective. It presents empirical results showing that different accuracy metrics on one dataset collapsed into three groups that were either strongly or uncorrelated. The document aims to help researchers and practitioners properly evaluate and compare recommender system algorithms.