"Smart display" redirects here. For the touchscreen computer project by Microsoft, see Smart Display. For smart television displays, see Smart TV.
A smart speaker is a type of loudspeaker and voice command device with an integrated virtual assistant that offers interactive actions and hands-free activation with the help of one "hot word" (or several "hot words"). Some smart speakers can also act as a smart device that utilizes Wi-Fi and other protocol standards to extend usage beyond audio playback, such as to control home automation devices. This can include, but is not limited to, features such as compatibility across a number of services and platforms, peer-to-peer connection through mesh networking, virtual assistants, and others. Each can have its own designated interface and features in-house, usually launched or controlled via application or home automation software.[1] Some smart speakers also include a screen to show the user a visual response.
As of summer 2022, it is estimated by NPR and Edison Research that 91 million Americans (35% of the population over 18) own a smart speaker.[2]
A smart speaker with a touchscreen is known as a smart display.[3][4] It is a smart device that integrates conversational user interface with display screens to augment voice interaction with images and video. They are powered by one of the common voice assistants and offer controls for smart home devices, feature streaming apps, and web browsers with touch controls for selecting content. The first smart displays were introduced in 2017 by Amazon (Amazon Echo) and Google (Google Home/Nest)
The built-in microphone in smart speakers is continuously listening for "hot words" followed by a command. However, these continuously listening microphones also raise privacy concerns among users.[7] These include what is being recorded, how the data will be used, how it will be protected, and whether it will be used for invasive advertising.[8][9] Furthermore, an analysis of Amazon Echo Dots showed that 30–38% of "spurious audio recordings were human conversations", suggesting that these devices capture audio other than strictly detection of the hot word.[10]
As a wiretap
There are strong concerns that the ever-listening microphone of smart speakers presents a perfect candidate for wiretapping. In 2017, British security researcher Mark Barnes showed that pre-2017 Echos have exposed pins which allow for a compromised OS to be booted.[11]
According to Umar Iqbal, an Assistant professor at Washington University in St. Louis, research indicates that data from consumer interactions with Alexa was used to targeted advertisements and products to consumer with over 40% of transmitted data lacking proper encryption raising privacy concerns. [12] Furthermore data indicates that due to the Smart Speakers ability to always capture audio, it begins to pick up on external conversations from consumers not related to commands given to the smart speaker. Things such as other members in the household, consumers on the phone and even Tv audio can be picked up by these speakers and stored for future use by companies. [13]
Voice assistance vs privacy
While voice assistants provide a valuable service, there can be some hesitation towards using them in various social contexts, such as in public or around other users.[14] However, only more recently have users begun interacting with voice assistants through an interaction with smart speakers rather than an interaction with the phone. On the phone, most voice assistants have the option to be engaged by a physical button (e.g., Siri with a long press of the home button) rather than solely by hot word-based engagement in a smart speaker. While this distinction increases the privacy by limiting when the microphone is on, users felt that having to press a button first removed the convenience of voice interaction.[15] This trade-off is not unique to voice assistants; as more and more devices come online, there is an increasing trade-off between convenience and privacy.[16]
Factors influencing adoption
While there are many factors influencing smart speaker adoption, specifically with regards to privacy, Lau et al. define five distinct categories as pros and cons: convenience, identity as an early adopter, contributing factors, perceived lack of utility, privacy, and security concerns.[7]
Smart speakers also benefit from their instant integration into the life of the consumer. Some capabilities of smart speakers are but not limited to setting alarms, sending voice messages to other smart devices in the home, the ability to send messages for you, instant answers to basic questions for any subject such as mathematics, geography, history, science and literature, and the ability to create task lists that can pair with your phone to remind you later on. Although these tasks can be completed by a phone, consumers tend to lean towards smart speakers due to factors such as their range being much greater then that of a phone and the need to not have to physically interact with the speaker to get the voice assistant as with most smartphones, certain parts of the phone must be interacted with to activate the speaking assistant. [17]
Another reason for the adoption of smart speakers has been the use of smart speakers to help assist those with disabilities. While most technology is limited by it needs for the user to be able to physically interact with the device, smart speakers are not bound by these limitations and can serve as an excellent tool for those who are unable to use their arms or legs.[18]
Security concerns
When configured without authentication, smart speakers can be activated by people other than the intended user or owner. For example, visitors to a home or office, or people in a publicly accessible area outside an open window, partial wall, or security fence, may be able to be heard by a speaker. One team demonstrated the ability to stimulate the microphones of smart speakers and smartphones through a closed window, from another building across the street, using a laser.[19]
31 million Echo devices in U.S. (January 2018)[21]
Summer 2019: English (US, UK, Ireland, Canada and Australia); French (France and Canada); German; Italian; Japanese; Portuguese (Brazilian) and Spanish (Spain and Mexico)[22][23][24]
14 million Google Homes in U.S. (January 2018)[21]
Summer 2019: Danish, Dutch, English (U.S., U.K., Canada, Australia, India and Singapore), French (France and Canada), German (Austria and Germany), Hindi, Italian, Japanese, Korean, Norwegian, Portuguese (Brazilian), Spanish (Spain and Mexico) and Swedish[29][24]
October 2019: English (US, UK, Canada, Australia and India); Chinese (Simplified); French; German; Italian; Japanese; Portuguese (Brazil); Spanish (Spain and Mexico)[31]
Support for Cortana on the Harman Kardon INVOKE was officially discontinued on March 9, 2021.[32][33]
^ abLau, Josephine; Zimmerman, Benjamin; Schaub, Florian (1 November 2018). "Alexa, Are You Listening?: Privacy Perceptions, Concerns and Privacy-seeking Behaviors with Smart Speakers". Proceedings of the ACM on Human-Computer Interaction. 2 (CSCW): 102:1–102:31. doi:10.1145/3274371. S2CID53223356.
^Ford, Marcia, and William Palmer. "Alexa, are you listening to me? An analysis of Alexa voice service network traffic." Personal and Ubiquitous Computing (2018): 1-13.
^Christoffer Lambertsson. 2017. Expectations of Privacy in Voice Interaction–A Look at Voice Controlled Bank Transactions. Ph.D. Dissertation. KTH Royal Institute of Technology