Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Continuous low-power music recognition (research.google.com)
113 points by stablemap on Jan 11, 2018 | hide | past | favorite | 54 comments


...this doesn't seem like it is at all worth giving up 1% of battery life. I don't think it's even worth giving away 0.25% of battery life to detect whatever the 70k songs stored in the database is, nor is it worth the space the database is taking up on the device. The question of "what song is this" comes up maybe once a month at best, and apps like Shazam already exist and have much deeper databases to search through. In other words, it does a worse job than existing solutions and uses power constantly.

It feels more like yet another solution looking for a problem. Worse, it screams like a foot in the door for telling users it's okay for microphones around you to always be listening, ala Amazon Echo. It also weakly smells like it'll immediately be used to send packets of what songs were listened to and similar frequency info off-device to be collected by the Google Big Data machine to be sold to RIAA members, as yet another way of extracting ad dollars from Android.


I would argue that it's not worth my 30 seconds to unlock my phone and open Shazam (which btw turns on the display and the radio, which then triggers other apps to poll for notifications, so there goes your 1%)

I would also bet the database is probably not much bigger then the shazam app itself.

I frankly just find it a well balanced solution both in terms of UX and in terms of engineering.


> it's not worth my 30 seconds to unlock my phone

It takes me ~4 "seconds" (saying "one-one thousand" aloud) to do it when typing in a password (sorry, don't have a stopwatch handy). Just did it 10 times just to test. It'd be even faster with pin/fingerprint/face unlocking, or with no locking at all.

I also used Shazam and Google's "what song is this" feature for somewhere less than 5 minutes on the initial few seconds of 10 songs and my battery indicator didn't move a single percentage during my testing. Shazam scored 10/10, Google missed one (a song from an esoteric Chicago band called Terminal Bliss - all of the songs I chose were intentionally fairly obtuse to try to find missing info in the databases).

But I can say with fair assurances Google's not shipping their entire database on phones, just whatever's most popular up to some (probably size driven) limit; any given CD releases with about ten songs, any given year sees tens of thousands of CD releases (about 75,000 albums in 2010 alone). Worse, the entire feature's use case is better suited towards less popular music since you're less likely to know the song's name if it's new or esoteric - so you can almost immediately guess that those 70k songs in that database is just ~30 years of various genre chart topping hits.

All you've done is cement my convictions that this is absolutely a solution in search of a problem.


> It takes me ~4 "seconds" (saying "one-one thousand" aloud) to do it when typing in a password (sorry, don't have a stopwatch handy). Just did it 10 times just to test. It'd be even faster with pin/fingerprint/face unlocking, or with no locking at all.

Ah, but did you test this in your quiet office or home, sitting at your desk with both hands free?

Let's try a real-world environment. Something like, say, driving in your car, in moderate traffic, with both hands on the wheel while listening to your radio. That ~4 seconds becomes a barrier-to-entry just large enough to abandon the idea of finding out what the name is to that song you are listening to.

Because, lets be honest, wondering what the song is isn't that important. But it is nice. Which is why this feature has some proponents.


> I would also bet the database is probably not much bigger then the shazam app itself.

I don't know what you're trying to say here. Shazam has millions of songs in their database, according to the original paper the offline mobile recognition system by google just has something over 70k finger prints. That's the big benefit of using an online system, you don't have to have all the fingerprints stored locally and also don't have to stick to the low power constraint.


> ...this doesn't seem like it is at all worth giving up 1% of battery life.

I view it differently. Percent of battery life isn't an interesting metric in a vacuum. It's whether you easily/regularly get the phone back to a charger without it feeling burdensome. That is to say, does the phone last all your waking hours without threatening to turn off.

I own one of these fancy new Pixel 2's. I find that the phone is usually above 30% when I'm returning to bed at night. The music detection has proved useful at times, and a single percent of battery isn't going to make me sweat.

However, in a year or two, when the battery starts losing performance, you may be right. My old phone, a 5X, often dipped close to 5% by the time I got to bed. I had to go out of my way to find a charger, which was frustrating. Sacrificing yet another bit of battery for this feature would only make that worse.


I have found it to be a useful feature that is worth the 1% daily battery life. I prefer it to Shazam/Soundhound because it's passive. I don't have to unlock my phone, open an app, press a button, and wait for the app to recognize the music. That whole process is disruptive to conversation if I'm with someone. How often the question of "what song is this" comes up depends on who you are, and it has occurred to me often enough such that I've recognized that I much prefer using Now Playing.

I agree that the database is a major limitation, although for me it acts as a first pass to see if the song I'm listening to is popular enough to be recognized.


> The question of "what song is this" comes up maybe once a month at best, and apps like Shazam already exist and have much deeper databases to search through.

very bold anecdata claim.

I would love to see Shazam numbers as to the actual median frequency per month.

I love this feature on my Pixel 2. I glance at my phone at least 2-3 a day just for this feature. To me it adds another layer of separation from being tied to me phone. Previously pulling up Shazam would bait me into engaging with notifications I didn't need to engage with.

I do agree with the privacy concerns though.


Yeah, I agree. If the phone doesn't recognize the song you want, you'll use Shazam anyway. So in the end, you can't trust the system 100% so the benefits of having this feature are pretty limited.


Have a Pixel 2 XL and like the feature and keep it on my lock screen. But I am horrible at remembering song names. I most like it with driving and the radio is not the type that displays the song name.


> Now Playing, has a daily battery usage of less than 1% on average, respects user privacy by running entirely on-device and can passively recognize a wide range of music.

Maybe I am missing something, but this seems to be much less respectful of my privacy than an app that only listens to the ambient sound around me when I explicitly give my consent (by opening/activating the app).


What's your actual privacy concern?

If the whole process happens entirely on-device, then this reveals absolutely no information of any kind to anyone, correct?


> If the whole process happens entirely on-device

Sure, if. Which I can verify, but I don't want to have to. I don't want to have to verify that a corporation isn't fucking me at every turn.

And it's possible that vulnerability will be discovered that lets people stream the audio somewhere, or listen to key words and send just those portions. I don't know enough about Android to know whether it's more plausible for that to happen, or if that sort of vulnerability would likely grant an attacker access to my mic even without this feature.

I don't think it's unreasonable for me to expect my device to not always be listening to me.


> Sure, if. Which I can verify, but I don't want to have to.

Wouldn't that be a risk even if this feature didn't exist? How do you know your phone isn't currently listening for music/speech and sending that data to the cloud without your knowledge?

> And it's possible that vulnerability will be discovered

Again though, that's already true regardless of whether this feature exists or not. Is there any reason to believe this feature is more likely to have a vulnerability than any other feature on your device? Why would you trust this code less than, for example, the code for your WiFi driver?


> Sure, if. Which I can verify, but I don't want to have to.

The great thing is that you don't have to! There's plenty of other people that are happy to do so and report their findings. The app would be outed as fast as other software that has attempted to send personal data without disclosure (Windows telemetry, Samsung TVs, etc.). I doubt Echo could ever get away with sending everything to Amazon's servers. And if you don't trust that others would do that, then you just don't have to use it, or use alternatives like Shazam.


Then don’t turn it on, it prompts you and asks permission to be enabled during setup.


In 2018, that is unreasonable. Not philosophically, but pragmatically speaking.


Anything sensed can be transmitted but not necessarily in its current form. It can be used to train some ml algorithm about you and the parameters can be transferred. Privacy should start at the sensing level.


So you're saying you'd prefer that your phone not have a microphone? Because that's where this information is ultimately coming from. (At the "sensing level".)


Maybe we need physical switches for the microphone and the camera.


This option has to be explicitly turned on. It's off by default (at least it was on my German Pixel 2).


It was enabled on my phone - which I discovered at the dentist's office. Now, it's possible that I idly clicked "ok" while setting up my Pixel 2 - it's not out of the realm of possibility - but I'm not very likely to turn that on at all.

Germany has much more stringent privacy regulations than the US does - maybe that's the difference.


I've just recently set up a Pixel 2 XL straight from factory a few times over the last couple weeks, all in the US. Every time I had to opt in to the background music detection feature, even in the US.


I must have toggled it on, then.


Can verify that in the US it was off by default and I had to turn it on. It is a feature I like and use pretty often.


Of course their baseline for privacy is not "leaks as little user data as possible", but "does everything on the cloud". Hence it seems to respect user privacy from their point of view.

Not that they have a reasonable point of view, of course.


While I agree with you that "cloud-private" is a new baseline that may not be a great thing - that's a strange point to make, given the text explicitly calls out that it's not doing anything on the cloud. It's all on-device.


Less than 1% sounds high to me. If everything on the phone used 0.3-7%, it adds up pretty quickly.


Great work, but super scary for me. The "service" of music recognition will probably be extended very quickly to a constantly running "environment recognition", just fingerprinting the audio around you, places, speakers etc..


Yes, this has a “proof of concept” feel to it.


I have the Nexus 2 with this feature, and it works remarkably well. No noticeable power drain, and it identifies a lot of songs that I never would have thought to ask about, like TV show intros and such. It's handy that if I ever wonder what a song that's playing is, I can pull out my phone and it's usually already there instead of having to unlock it, ask it what's playing, and wait a bit, if the internet is even working there. Pretty cool that they can pull this off with no network communication.


Seems to me that it’s another part of a fingerprint for a user that can locate them, and possibly place you with others in the vicinity who have ID’d the same audio. Retail outlets have playlists that could be ID’d, possibly with enough granularity to indicate your path through a shopping mall or your dwell time in a particular department.

I know the recognition is done on-device for some percentage of tracks, but it’s been unclear to me what happens to the running tally of times/locations/tracks identified afterward.


It's not sent anywhere or even stored locally on the device. Some people view that as a missing feature, actually, so there are apps that add this functionality[0], so you can review the songs you heard and find it later.

[0] http://www.androidpolice.com/2017/10/26/add-track-id-history...


> author = {Beat Gfeller and ...

A case of nominative determinism?

https://en.wikipedia.org/wiki/Nominative_determinism


Beat is a _very_ common name in Switzerland. I know about 10, none of them work in music :)


I wasn't aware, interesting!


Is it pronounced bay-at?


From https://en.wikipedia.org/wiki/Beat_(name):

> pronounced "BEH-awe-t"


Wow that explains the many meteorologists in my area with names like Rains


I wonder how well this performs. I see I have roughly 2000 songs on mys sd card. I find it a little hard to believe that 40x that is enough to cover a big enough slice to be able to actually answer the question of "what's this" when it comes up?

[ed: i actually think I'd have plenty of room for 70k songs in high quality ogg vorbis vbr or equivalent - that'd frankly be more interesting... ]


Only being able to compare to 70k songs isn't really going to be adequate the second you step off a mainstream playlist.


Will they use this tech for non-music purposes? Such as for enhancing google ads or for sending keywords to the NSA?


Automatically identifying whether you're at an illegal public performance and sending lawyers your way to extract the songs from your ears.


Well change the words "music" to "conversation" and song to "person" (or phrase), and things start to get pretty scary.


Probably already have.


Using neural nets for this purpose makes it likely to be susceptible to adversarial examples. Fairly harmless in this use case, but if a similar system would be used for ContentID on YouTube, that could be exploited to subvert it (i.e. spam it with false positives and contest the verdicts en masse).


Is this why Shazam sold? Its basic functionality was going to be built into Andriod/Google Assistant?


It already is available on both the iPhone and Android through Siri/Google Assistant. Just ask them "What song is this?" This just seems to be an extension of it so you don't need to ask, the phone can just display the album info passively when you pick up the phone. Of course Apple's integration is already using Shazam for the actual identification, but you don't need the app installed.

Hard to compete with that if you aren't integrated at an OS level.


I think it was inevitable that this feature would get built into phone OSes and at that point nobody would want a separate app.


Is this pixel 2 exclusive or can I install this on my xiaomi? I skimmed the paper but didn't find any name for that app.


These are one kind of HN posts that drive me up the wall.

You say less power to get us to buy more?

How about saving all that energy and using it for real world building.


To be fair, they say:

> Since everything runs locally on the device without sending either audio or fingerprints to a server, the privacy of the user is respected and the whole system can run in airplane mode.

This is far more respectful of user privacy than we usually see from Google. I, for one, am impressed.


What does this have to do with the comment you replied to?


A complaint about marketing ("get us to buy more"), on an article about always-on audio processing in a phone, implies a concern about privacy-invading user tracking. I was pointing out that (on the face of it, at least) this doesn't seem to do that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: