SoundHound AI, Inc. (SoundHound) is a global leader in conversational intelligence, offering independent Voice AI solutions that enable businesses to deliver high-quality conversational experiences to their customers.
Built on proprietary technology, SoundHound’s voice AI delivers best-in-class speed and accuracy in numerous languages to product creators across automotive, TV, and IoT, and to customer service industries via groundbreaking AI-driven products like Smart Answering, Smart Ordering,...
SoundHound AI, Inc. (SoundHound) is a global leader in conversational intelligence, offering independent Voice AI solutions that enable businesses to deliver high-quality conversational experiences to their customers.
Built on proprietary technology, SoundHound’s voice AI delivers best-in-class speed and accuracy in numerous languages to product creators across automotive, TV, and IoT, and to customer service industries via groundbreaking AI-driven products like Smart Answering, Smart Ordering, and Dynamic Interaction, and Employee Assist. Along with SoundHound Chat AI, a powerful voice assistant with integrated Generative AI, SoundHound powers millions of products and services, and processes billions of interactions each year for world class businesses.
Voice-enabled conversational user interface is a more natural interface for nearly all use cases, and product creators should have the ability to design, customize, differentiate, innovate and monetize the interface to their own product, as opposed to outsourcing it to a third-party assistant. For example, using SoundHound, businesses can voice-enable their products so consumers can say things like, ‘Turn off the air conditioning and lower the windows,’ while in their cars, ‘Find romantic comedies released in the last year,’ while streaming on their TV and even place food orders before arriving at a restaurant by talking to their cars, TVs or other IoT devices. Additionally, SoundHound’s technology can address complex user queries such as, ‘Show me all restaurants within half a mile of the Space Needle that are open past 9pm on Wednesdays and have outdoor seating,’ and follow-on qualifications such as ‘Okay, don’t show me anything with less than 3 stars or fast food.’
The SoundHound developer platform, Houndify, is an open-access platform that allows developers to leverage SoundHound’s Voice AI technology and a library of over 100 content domains, including commonly used domains for points of interest, weather, flight status, sports and more. SoundHound's Collective AI is an architecture for connecting domain knowledge that encourages collaboration and contribution among developers. The architecture is based on proprietary software engineering technology, CaiLAN (Conversational AI Language), and machine learning technology, CaiNET (Conversational AI Network) to ensure fast, accurate and appropriate responses.
The company’s market position is strengthened by the technical barriers to entry in the Voice AI space, which tend to discourage new market participants. Furthermore, the company’s technology is backed by significant investments in intellectual property, with over 192 patents granted and over 109 patents pending, spanning multiple fields, including speech recognition, natural language understanding, machine learning, monetization and more.
Strategy
The company has IoT devices with Voice AI.
Technical Development
SoundHound’s technology represents the evolution of several disruptive breakthroughs in voice-AI and sound recognition developed over almost two decades. Using innovative audio and music identification technology, in 2009, the company’s founders launched the SoundHound music identification app. Since its inception, the app has received over 300 million downloads.
The company’s engineers knew that for a true voice engine to flourish, it needed to understand speech directly, just like humans do.
In 2015, SoundHound introduced the Houndify Voice AI platform incorporating breakthrough Speech-to-Meaning and Deep Meaning Understanding technologies, which, to SoundHound’s knowledge represented voice interaction technologies not yet broadly available at the time as its foundation.
After the company went public in 2022, SoundHound rapidly introduced innovation to the market with Smart Ordering, Dynamic InteractionTM, SoundHound Chat AI, and Smart Answering, among other products (see ‘Products and Technology’ hereafter).
Products and Technology
SoundHound’s momentum in the Voice AI market can, in large part, be attributed to the company’s large number of technology breakthroughs.
Houndify Platform
SoundHound’s Voice AI platform combines advanced AI with engineering expertise to help brands build conversational voice assistants. From proprietary components to customizable and scalable solutions, the company offers tools to build a highly accurate and responsive voice user interface.
The suite of Houndify tools includes Application Programming Interfaces (‘API’) for text and voice queries, support for custom commands, extensive library of content domains, inclusive Software Development Kit platforms, collaboration capabilities, diagnostic tools, and built-in analytics.
Houndify provides a web API that takes in text queries or audio and returns actionable JavaScript Object Notation to anyone with an internet connection wanting to add Voice AI to any product or application.
CaiNET and CaiLAN Expert Domain Selections
SoundHound’s CaiNET software uses machine learning to enhance how domains work together to better handle complex queries, including natural language processing, predictive analytics, and building language models, or translation of speech.
SoundHound’s proprietary CaiLAN software expertly arbitrates responses so users get better answers from the right domain, such as for use with natural language processing, predictive analytics, and building language models, or translation of speech.
Automatic Speech Recognition (‘ASR’)
The company’s highly optimized, tunable, and scalable ASR engine supports vocabulary sizes containing millions of words. Houndify’s machine learning infrastructure allows the company to tune the engine to achieve optimal Computer Processing Unit (‘CPU’) performance while delivering high accuracy rates.
Houndify’s language and acoustic modelling architecture also uses machine learning to increase word recognition accuracy. Rapid iteration is possible due to the company’s accelerated training pipeline and architecture that improves as data is collected. Highly accurate transcriptions result from advanced acoustic models trained to perform in a variety of scenarios — including in severely noisy environments and when accented language is spoken.
Natural Language Understanding (‘NLU’)
The company’s proprietary Speech-to-Meaning technology tracks speech in real-time and understands the context, even before the user has finished speaking. Instead of the typical two-step process of transcribing speech into text and then passing the text into an NLU model, Houndify can accomplish both of these tasks in one step, delivering faster and more accurate results.
Houndify’s ability to process and understand speech the instant a user stops speaking gives voice assistants the ability to respond faster. Understanding speech in real-time without requiring additional processing or waiting for the user to finish speaking creates responsive and natural conversations between people and products.
By understanding context, Houndify responds accurately to users by distinguishing between similar words and names. The company’s NLU can discern the difference between words that sound the same, but have different spellings and meanings. For example, if users want to navigate to 272 Hoch Street in Dayton, Ohio, it won’t look for Hawk Street.
Using the company’s proprietary Deep Meaning Understanding technology, a custom voice assistant can handle complex queries with compound criteria, including conversational follow up, address multiple questions and filter results simultaneously — accurately and quickly answering users’ most complex questions.
These technologies are anchored by three important innovations: Speech-to-Meaning, Deep Meaning Understanding and Collective AI.
Speech-to-Meaning refers to SoundHound’s ability to convert speech to meaning simultaneously and in real time. Most traditional approaches first convert speech to text, and then convert text to meaning. This approach can be both slower and less accurate. It’s slower because the two steps are done in sequence, and the additional processing time of the second step can be noticeable by the end user. It can also be less accurate because if the first step of speech to text makes a mistake, the resulting incorrect text is then sent to the second step, and the error further propagates.
The company’s development of Speech to Meaning technology was inspired by the human brain. As the company listens to someone speaking, the company’s brain does not convert speech to text, and then text to meaning. Instead, the company’s brain converts speech to meaning simultaneously and in real time. With Speech-to-Meaning, as you speak to SoundHound’s technology, it performs both speech recognition and language understanding simultaneously, which results in faster response time and higher accuracy, because real-time language understanding can feed into the real-time speech recognizer as additional information to reduce errors.
Deep Meaning Understanding is the company’s approach to language understanding that allows the company’s Voice AI platform to understand highly complex conversation.
Collective AI is an architecture that gives potential to SoundHound to improve the understanding capability of its platform exponentially based on linear contributions.
Most other platforms add skills or domains that are separate and don’t interact with each other. For them, linear contribution results in linear growth in understanding, which is less scalable. With the Collective AI architecture, SoundHound domains can be interconnected and learn from each other. As developers contribute to the platform, the platform’s understanding capability can grow exponentially.
Smart Ordering
SoundHound Smart Ordering offers an easy-to-understand voice assistant for restaurants that takes phone orders and automatically processes them by seamlessly integrating with multiple POS systems. For enterprises, the company also offers a flexible Gateway to integrate with custom POSs.
Dynamic Interaction
Dynamic Interaction is a category-level breakthrough in conversational AI that raises the bar for human-computer interaction by not only recognizing and understanding speech, but also responding and acting in real-time. Where existing voice technology requires wake words and relies on turn-taking with awkward pauses to process requests, Dynamic Interaction uses the twin technologies of fragment parsing – which breaks speech down to partial-utterances and processes them in real-time – and full-duplex audio-visual integration to create an instantaneous, next-generation experience in human-computer interaction.
SoundHound Chat AI
The company launched SoundHound Chat AI, which will usher in a new phase of voice-enabled, conversational AI by combining the power of software engineering and machine learning generative AI.
SoundHound Chat AI integrates with dozens of knowledge domains, pulling real-time data like weather, sports, stocks, flight status, restaurants, and many more. The company combines this with the most cutting-edge large language models like OpenAI’s ChatGPT to deliver the most accurate, timely, and comprehensive responses. There is no need for awkward search queries since you can speak to SoundHound Chat AI naturally, like another person. You can also follow-up questions and commands without awkward pauses to filter, sort, or add more information to the original request.
Smart Answering
SoundHound Smart Answering is built to offer all customer establishments, including restaurants, the option to build an easy to use, custom AI-powered voice assistant that can handle 100% of phone calls including, greetings, hours, menu, location, delivery, wait time, policies, promotions, including SMS functionality for reservations and appointments, and many more standard and custom options.
Wake Words
Wake words are the entry point into branded voice experiences, allowing users to invoke the assistant by literally speaking the company’s name. Examples range from ‘Hey Pandora’ in a mobile app to ‘Hey Peugeot’ within a vehicle.
Rigorous development and testing enable the company’s wake words to perform in noisy environments and minimize false-positives or false-negatives. The company uses advanced machine learning algorithms and Deep Neural Networks to provide broad robustness to the company’s high-volume training data, resulting in high accuracy.
Custom Domains
The company’s library of over 100 public domains is available to give developers instant access to a broad range of content to fit their unique use cases. This includes multi-category content intended to appeal to broad range of audiences, including for instance, sports scores, weather, podcasts, travel information, recipes, stock prices, among many others.
Companies can enhance product functionality or proprietary operations with Houndify Private Domains, allowing customization and development of more specific content. Customers who subscribe for this service have full access to their private domains securely on the company’s platform while retaining the ability to iterate and update content.
For example, an automotive manufacturer can make helpful updates about the car’s user manual over time. In this way, SoundHound becomes a long-term ‘partner’ to its customers, helping companies create the domains that they need in order to improve brand value for their own customers or end users.
Text-to-speech (‘TTS’)
A TTS helps companies create a unique voice that differentiates them from the competition. Brands can fully express their personality by choosing the gender, tone, and personality that will become their vocal identity.
The company’s machine learning algorithms transform recorded voices into large databases of spoken sounds to form entire vocabularies of natural language — adapted to the user’s environment. The company can transform any voice to generate a high-quality TTS with a small CPU footprint.
Edge and Cloud Connectivity
With edge (embedded) the company offers a fully-embedded voice solution for brands seeking the convenience of a voice user interface without the privacy or connectivity concerns of the internet. Includes full access to custom commands and the ability to instantly update commands during development.
With Cloud the company equips your voice assistant with real-time data from the cloud, deliver the most relevant responses with no CPU or memory restrictions, and retain ownership of customer relationships with access to data and analytics.
To harness the capabilities of full cloud connectivity with the reliability of embedded edge voice technology. Houndify Edge Hybrid solutions are designed to ensure that devices are always-on and responsive to commands. Allows for over-the-air product updates and a broader voice experience with the level of cloud-connectivity that best matches the product and its users.
Revenue Model
Market Momentum
The company’s entry into the Voice AI space began with 10 years of constant innovation in ‘stealth’ building disruptive technologies in Voice AI using innovative approaches. The company’s goal was to build a differentiated Voice AI technology that the company fully owns and which is significantly better than other solutions in the market. The company achieved that goal and unveiled the result in 2015, launching it as the Houndify platform in 2016.
Building a Diverse, Global Customer Base
SoundHound continues to expand the capabilities that make the company well-positioned to serve the needs of customers globally. The company has grown its solution from a single language capability to 25 languages.
The company’s customers include a range of product and service providers of all sizes, spanning a range of industries, including automotive, IoT, apps, restaurants, and more. Many of the company’s global customers have end users in multiple regions and industries and the company has seen its products successfully used by them across multiple contexts and purposes.
Three Pillars for Growth
The company has identified three pillars for revenue growth: Royalties, Subscription, and Monetization, and all three pillars contribute to the company’s revenues. While the majority of revenues come from royalties and subscriptions, over time the company expects its revenues from monetization pillars to increase meaningfully in the future.
Royalties: This involves voice-enabling a product. The product creator pays the company a royalty based on volume, usage, or duration. SoundHound collects royalties when the company’s platform is integrated into a product, such as a car, smart TV, or IoT.
Subscription: This involves voice-enabling a service that doesn’t rely on a physical product. Examples include when SoundHound enables customer service or food ordering for restaurants or content management, appointments, or voice commerce, the company generates subscription revenue from the service providers.
Monetization: This pillar creates an ecosystem that enables monetization services in products and services from both pillar one and pillar two. When users of a voice-enabled product access the voice-enabled monetization, this creates new leads and transactions. SoundHound generates monetization revenue for generating these leads and transactions, and will share revenue with the product creators.
The company expects the disruptive three-pillar business model will create a monetization flywheel. As more products integrate into the company’s platform, more users will use it, and more services will choose to integrate as well. This creates even more usage, and results in a flow of revenue share to product creators, which further encourages even greater adoption and integration with the company’s platform, and the company expects the cycle will perpetually continue to expand and create an ecosystem with a compound impact on the company’s business.
SoundHound’s Criteria for Adoption
The company’s product will be adopted because customers typically choose the best technology and want to own their brand and control their data. The company strives to provide its customers with the best technology, and the company can provide a white label solution giving its customers control of their brands. In some industries you may have to choose between technology and brand control. In the company’s case, the company intends to offer its customers the best of both, enabling them to offer disruptive technologies to their users while maintaining control of their brand and user experience.
With the company’s disruptive monetization strategy, the company also provides a path to monetization. By choosing the company’s platform, product creators can generate additional revenue while making their product better using Voice AI, providing further incentive to choose the company’s platform.
The company offers an ecosystem with its expert domain selections that seamlessly arbitrates between LLMs and real time domains and limits ‘AI hallucinations’, along with definable privacy controls, which are becoming increasingly important in the industry of Voice AI. The company also offers edge, cloud and hybrid solutions, which means the company’s technology can optionally run without a cloud connection for increased flexibility and privacy. The company’s focus is on delivering the most advanced Voice AI in the world and thus allowing the company’s partners to differentiate and innovate their overall experiences for their brands.
Product creators know their product and users best. The idea of a single third-party assistant taking over their product is not reflective of the company’s anticipated future. The company envisions that every product will have its own identity, and they will have Voice AI customized in different ways. They can each tap into a single platform to access the ever-growing set of domains, but the product creators can innovate on top of platform and create value for the end users in their own way.
Sales and Marketing
The company takes an insight-driven, account-based marketing approach to build and expand the company’s relationships with customers and partners. The company collects feedback directly from them to garner insights that help drive the business and product. The company also works with analysts and higher education institutions to conduct studies, test and validate technology performance, providing key proof points for those considering the company’s products. In parallel, marketing and communications drive the company’s brand equity and narrative through ongoing announcements, campaigns, events, speaking opportunities, and public relations efforts.
The company’s demand generation efforts span the full customer funnel to target prospects across a variety of channels, including advertising, email, social media, search engines and many other digital channels.
Sales and marketing will play a critical role in the next phase of the company’s evolution as a company, with key ongoing investments in the company’s team and leadership. While the company’s products are already scaling with existing customers, markets and verticals, the company sees significant opportunities to grow into new ones. Increased sales and marketing efforts will enable the company to capitalize on the tremendous momentum the company is building and the company expects to continue expanding resources to grow its personnel and leadership team focused on sales and marketing.
During the year ended December 31, 2024, one customer accounted for 14% of SoundHound's total revenues.
Intellectual Property
SoundHound’s intellectual property portfolio includes over 192 patents granted and over 109 patents pending worldwide. These patents cover areas, such as speech recognition, natural language understanding, machine learning, human interfaces, and others, including monetization and advertising.
Out of the company’s 301 patents granted and pending, more than 40 of these patents are in conversational monetization.
Government Regulations
The company is subject to various laws, regulations, and permitting requirements of federal, state, and local authorities, related to health and safety, consumer privacy, anti-corruption, export controls and AI-related regulations. The foregoing may include the U.S. Foreign Corrupt Practices Act of 1977, the U.S. Export Administration Regulations, Money Laundering Control Act of 1986 and any other equivalent or comparable laws of other countries. The company is in material compliance with all such laws, regulations, and permitting requirements.
History
SoundHound AI, Inc. was founded in 2005.