Appliances in commercial buildings are connected to the Internet and becoming programmatically controllable. However, as the number of smart appliances increases, identifying and controlling one instance among thousands in a building becomes challenging. Existing methods have various problems when deployed in large commercial buildings. For example, proprietary remote controllers and smartphone apps become unmanageable. Voice or gesture command assistants require users to memorize many control commands in advance. Attaching visual markers (e.g., QR codes) to appliances introduces considerable deployment overhead and cannot work at a distance. In this dissertation, we introduce new approaches for easier appliance selection and interaction. We first study how several different indoor localization approaches can be used to generate a list of nearby appliances for users to choose from. Wi-Fi signal strength fingerprinting can provide a rough estimation with an error of 10 meters. It can also be fused with different types of information, such as acoustic background noise. However, indoor localization can only reduce the displayed appliance list and is not sufficient to provide a quick and intuitive appliance selection mechanism. In comparison, identifying an appliance by simply pointing a smartphone camera and controlling the appliance using a graphical overlay interface is more intuitive. We introduce SnapLink, a responsive and accurate vision-based system for mobile appliance identification and interaction using image localization. Compared to the image retrieval approaches used in previous vision-based appliance control systems, SnapLink exploits 3D models to improve identification accuracy and reduce deployment overhead via quick video captures and a simplified labeling process. To evaluate SnapLink, we collected training videos from 39 rooms to represent the scale of a modern commercial building. It achieves a 94% successful appliance identification rate among 1526 test images of 179 appliances within 120 ms average server processing time. Furthermore, we show that SnapLink is robust to viewing angle and distance differences, illumination changes, as well as daily changes in the environment. On top of SnapLink, we build MARVEL (Mobile Augmented Reality with Viable Energy and Latency) to provide a continuous appliance identification and interaction experience. MARVEL identifies appliances with imperceptible latency (~100 ms) and low energy consumption on regular mobile devices. In contrast to conventional MAR systems, which recognize objects using image-based computations performed in the cloud, MARVEL mainly utilizes a mobile device’s local inertial sensors for recognizing and tracking multiple objects, while computing local optical flow and offloading images only when necessary. We propose a novel system architecture which uses local inertial tracking, local optical flow, and visual tracking in the cloud synergistically. On top of that, we investigate how to minimize the overhead for image computation and offloading. We have implemented and deployed a holistic prototype system in a commercial building and extensively evaluate MARVEL’s performance. It reveals that the efficient use of a mobile device’s capabilities significantly lowers latency and energy consumption without sacrificing accuracy.