Technology in different forms is available ubiquitously through much of the world. The study of Ubiquitous Computing (Ubicomp) is concerned with enabling a future in which the most useful applications of such technology are feasible to build, and pleasing to use. But what is useful? What is usable? What do people actually need? These questions are only beginning to be answered partly because Ubicomp systems are more difficult to evaluate, particularly at the early stages of design, than desktop applications. This difficulty is due to issues like scale and ambiguity, and a tendency to apply Ubicomp in ongoing, daily life settings unlike task and work oriented desktop systems. This paper presents a case study of three Ubicomp systems that were evaluated at multiple stages of their design. In each case, we briefy describe the application and evaluation, and then present a set of lessons that we learned regarding the evaluation techniques we used. Our goal is to better understand how evaluation techniques need to evolve for the Ubicomp domain. Based on the lessons that we learned, we suggest four challenges for evaluation.