Show HN: Sonauto API – Generative music for developers
sonauto.aiHello again HN,
Since our launch ten months ago, my cofounder and I have continued to improve our music model significantly. You can listen to some cool Staff Picks songs from the latest version here https://sonauto.ai/ , listen to an acapella song I made for my housemate here https://sonauto.ai/song/8a20210c-563e-491b-bb11-f8c6db92ee9b , or try the free and unlimited generations yourself.
However, given there are only two of us right now competing in the "best model and average user UI" race we haven't had the time to build some of the really neat ideas our users and pro musicians have been dreaming up (e..g, DAW plugins, live performance transition generators, etc). The hacker musician community has a rich history of taking new tech and doing really cool and unexpected stuff with it, too.
As such, we're opening up an API that gives full access to the features of our underlying diffusion model (e.g., generation, inpainting, extensions, transition generation, inverse sampling). Here are some things our early test users are already doing with it:
- A cool singing-to-video model by our friends at Lemon Slice: https://x.com/LemonSliceAI/status/1894084856889430147 (try it yourself here https://lemonslice.com/studio)
- Open source wrapper written by one of our musician users: https://github.com/OlaFosheimGrostad/networkmusic
- You can also play with all the API features via our consumer UI here: https://sonauto.ai/create
We also have some examples written in Python here: https://github.com/Sonauto/sonauto-api-examples
- Generate a rock song: https://github.com/Sonauto/sonauto-api-examples/blob/main/ro...
- Download two songs from YouTube (e.g., Smash Mouth to Rick Astley) and generate a transition between them: https://github.com/Sonauto/sonauto-api-examples/blob/main/tr...
- Generate a singing telegram video (powered by ours and also Lemon Slice's API): https://github.com/Sonauto/sonauto-api-examples/blob/main/si...
You can check out the full docs/get your key here: https://sonauto.ai/developers
We'd love to hear what you think, and are open to answering any tech questions about our model too! It's still a latent diffusion model, but much larger and with a much better GAN decoder.
Interesting that Suno et al miss out on the obvious problem that actual musicians need extra musicians for their own projects.
For instance a guitarist will have a track they wish they had vocals for(and lyrics) for and if they could pay for that they would.
Literally if you could highlight a tune section in your DAW, prompt it, and vocals + lyrics were generated, possibly different version or harmonies for existing parts etc. Musicians already pay for plugins but the singing ones are awful to use so far.
We're super interested in working on this (and melody conditioning) and even have some of the code written to generate the training data, but we want our base model to get a bit better before this becomes our main focus. Check back in a few months!
Honestly, this is a good use case and I think I still am not a fan. It's an extra step-away from a drum machine so maybe I can stomach it eventually but as a guitarist I love writing riffs and songs but just don't have the time and patience to put together decent sounding drum tracks against it. Garageband/Logic and others have added an AI drummer but still doesn't feel great.
I probably would be happy paying a service I could drop a riff into and get decent drum track that goes with it. Even more would be while recording or playing it modifies and adapts, it can be recorded and clipped. Something that fits a clean workflow. If anyone makes this please don't make it such a pain as most VSTs and plugin systems where there are like 4 different installers and licensing software layers.
On one hand this is impressive, and I've been wondering when something like this would appear. On the other hand, I am -- like others here have expressed -- saddened by the impact this has on real musicians. Music is human, music theory is deeply mathematical and fascinating -- "solving" it with a big hammer like generative AI is rather unsatisfying.
The other very real aspect here is "training data" has to come from somewhere, and the copyright implications of this are beyond solved.
In the past I worked on real algorithmic music composition: algorithmic sequencer, paired with hardware- or soft- synthesizers. I could give it feedback and it'd evolve the composition, all without training data. It was computationally cheap, didn't infringe anyone's copyright, and a human still had very real creative influence (which instruments, scale, tempo, etc.). Message me if anyone's still interested in "dumb" AI like that. :-)
Computer-assisted music is nothing new, but taking away the creativity completely is turning music into noise -- noise that sounds like music.
> "solving" it with a big hammer like generative AI is rather unsatisfying.
The reason is greed. They jump on the bandwagon to get rich, not to bring art. They don't care about long term effects on creativity. If it means that it kills motivation to create new music, or even learn how to play an instrument, that's fine by these people. As long as they get their money.
If our sole goal was to get rich we would have pivoted to some b2bsaas thing as many suggested to us. What we’ve actually seen is so much new creativity from people who otherwise would never have made music.
Nothing was stopping them from making music before other than laziness.
I’m so sick of hearing this excuse. “I can’t draw so I use AI,” as if the people who can draw were born that way.
No, they spent countless hours practicing and that’s what makes it art. Because it’s the product of hours of decision making and learning. You can not skip ahead in line. Full stop.
This just sounds like gatekeeping.
I think it's the opposite. They are not saying "those people shouldn't draw [using AI]", they are saying "those people should've been drawing all this time".
Many people may not have the time, talent, and/or dedication, but I still think they should get to make music.
Describing music to an AI is not "making music" the same way that hiring a musician and asking them to write you a rock song about a breakup is not "making music"
They can do so anytime they pick up an instrument. But plugging things into AI is not making music.
> Message me if anyone's still interested in "dumb" AI like that. :-)
Not sure how to reach out, but I'm definitely interested in reading about procedural methods in music synthesis. Any links describing your approach?
Added a link in my profile that leads to a brief demo and description. Not posting here as it'd crumble under too much load. :-/
You mean like how real pianists suffered when the automated piano came, or how live music died when the record player came?
Actually, noise that sounds like music is some of the best music there is: electroacoustic music.
A lot better than most music on the radio. ;-)
> Message me
I don't see any contact info in your profile, but I have an email in mine. I am interested in hearing more about your process and if you have music for sale anywhere, I like to support electronic artists doing interesting stuff.
same here
I added a link in my profile if you're curious.
Anyone with ears can find music satisfying. You don't need an artist's backstory or blessing for that. By all means use slow AI to get the same point fast AI can get to, but don't ask me to value it differently.
I have many times watched guitar players at their work, and gone home to try and do the same thing. I definitely value that differently.
And AI doesn’t make satisfying music. Music is partially derivative in the human sector, but only derivative in AI. That’s why it sounds like shit to reasonable ears.
It’s less than worthless.
I really wish this trend of prompting gen AI models with text would stop. It's really meaningless. Musicians need gen AI they can prompt with a melody on their keyboard. Or a bit of whistling into the microphone. Or a beat they can tap on the table. That is what allows humans to unleash their creativity. Not AI generating random bits that fit a distribution of training data. English language is not the right input for anything except for information retrieval tasks.
Agreed! Those will be much more fun and we plan to support that. However, right now we're focused on making the base model slightly better, then we can easily add all of those controls (a-la ControlNets with Stable Diffusion).
But this is not easy, it's the real challenge here as there are lots of text-to-audio models out there. It is far from solved for Stable Diffusion as well. ControlNet is pretty bad. Just try taking the photo of an empty room and asking an image model to add furniture. Or to change a wall colour. Or to style an existing photo as per the style of another and so on. We are very far from being able to truly control the output generated by the AI models, which is something that a DAW excels at. I'd start with an AI-powered DAW rather than text-to-audio and try to add controls to it. It's like Cursor vs Lovable if you get my drift.
> Not AI generating random bits that fit a distribution of training data
How is that specific to text prompting? If you tap your fingers to a model and it generates a song from your tapping, it's still just fitting the training data as you say.
The current AI music apps have a certain chunking problem: they force extending the song with segments that may or may not fit, which users likely choose as "good enough" and get Frankenstein mash-up songs that have no coherent "flow" or "progression" as its actually chunks of "similar sounding songs" not a coherent "full song generation" by AI but editing result of multiple chunks merged into something.
I don't think that is the problem for Sonauto V2, on the contrary it is more a challenge that the model is too consistent with preceding content.
Here are a few of my songs, I think they are fairly consistent?
https://sonauto.ai/song/e2e3d210-69b4-4ad7-96d1-fb5744d0c648
https://sonauto.ai/song/a94e04a9-7b74-4b87-b5ed-ca3e8d2798d0
https://sonauto.ai/song/55a36595-c60a-4346-81d8-6f03ebe690ff
One thing I've been thinking about is how to do a better hobbyist plan system. It would be cool to do a flat rate unlimited plan, but we wouldn't want that to then be abused by larger customers/companies. Are there existing API providers you think solve this particularly well?
I don't think it meets your ask of "solve this particularly well" but the unlimited plans in video that I am familiar with have a fast/slow queue system. This effectively limits the plan. It seems, as well, that these kind of queue systems are tiered. So you can have N number of fast queued items, X number of tier one slow queue, Y number of tier two slow queue, etc. On the backend this is probably just some kind of weighted priority queue where the number of requests in some time duration determines some weight scaling factor.
I think this is a good start, X high speed queries per hour then unlimited low-priority ones after. Do you know of any specific companies that do this we could take a look at?
Remember that you’ve also got a nice natural limitation here: if it’s a hobbyist and not a (commercial) API consumer, there’s only so fast they can listen to the output. Even if they’re rapidly tweaking nobs in a DAW, you can use the play/pause signal to help prioritize the queue, depending on how expensive it is to serialize the GPU state and rehydrate it again. You also might not need to complete generation until the user reaches the play point so you can shuffle around the queue a lot. For example if the user skips after ten seconds you might not need to generate the rest until they try to play that track again, and when they do you usually have enough time before they reach the previous stopping point to generate some more sections.
It might also be helpful to come up with some ways to segregate customers so that “prosumer” users get faster “cold starts” (so that they can iterate faster) at the expense of sometimes having to wait for generation to start back up again.
runway.ai (video gen) is what I was thinking when I suggested this.
Why would a hobbyist need an unlimited plan?
E.g., in the case of a future "LibreMusic" open source UI or an integration into their DAW they work with on the weekends. I'd get pretty annoyed if I had to keep putting a coin in the machine to adjust Logic Pro effects.
So if I make a song using this API, who owns the copyright? Is it me or Sonauto?
I'm not sure to what extent AI music is copyrightable (I think it depends on a case-by-case amount of human influence) but our TOS assigns any rights we may have to the user.
From their terms (https://sonauto.ai/tos):
8. OUTPUT As between You and the Services, and to the extent permitted by applicable law, You own any right, title, or interest that may exist in the musical and/or audio content that You generate using the Services ("Outputs"). We hereby assign to You all our right, title, and interest, if any, in and to Your Outputs. This assignment does not extend to other users' Outputs, regardless of similarity between Your Outputs and their Outputs. You grant to us an unrestricted, unlimited, irrevocable, perpetual, non-exclusive, transferable, royalty-free, fully-paid, worldwide license to use Your Output to provide, maintain, develop, and improve the Services, to comply with applicable law, and/or to enforce our terms and policies. You are solely responsible for Outputs and Your use of Outputs, including ensuring that Outputs and Your use thereof do not violate any applicable law or these terms of service. We make no warranties or representations regarding the Outputs, including as to their copyrightability or legality. By using the Services, You warrant that You will use Outputs only for legal purposes.
You own the rights, but Sonauto is granted the rights to use it as well.
>You own any right, title, or interest that may exist
>We hereby assign to You all our right, title, and interest, if any
>You are solely responsible for Outputs and Your use of Outputs
I love how it clearly laid out the scenario that the right don't exist, yet you are responsible.
This is pretty cool! It's noticeably better than any of the other similar music generation tools I've tried, kudos!
This looks pretty cool to integrate in hobby projects, however after creating an account via Google, clicking "Payment portal" shows this error :
Error creating billing portal Failed to create billing portal session: No configuration provided and your live mode default configuration has not been created. Provide a configuration or create your default by saving your customer portal settings in live mode at https://dashboard.stripe.com/settings/billing/portal.
Also when trying to update my profile picture :
Failed to update image! column users.current_period_end does not exist
Stripe issue should be fixed, second issue likely happens if you go to the api page sometime in your session before going to the profile page and then you try to edit your picture. We'll work on that. Thanks for reporting!
I would encourage everyone who thinks "I want to apply AI to music" to look at the existing problems that creators have, talk to them, and work to bring new products to market instead of things that devalue their work.
"Generate a rock song" is not a problem that working musicians have. "Take this riff I recorded with whatever guitar I have in the studio and show me what it sounds like as a Les Paul through a 5150 or Strat through a AC30" is, though.
Well, I create and know a fair bit about the history of music technology. Self playing pianos, for instance, allowed composers to write pieces with new complexities that would be physically impossible to play. Drum machines opened up for new types of music in the 1980s that would have been very unnatural for a physical drummer. In general, innovations in music come as consequences of technological advances and you cannot predict in advance how composers/musicians utilize these tools. The same is true for folk and classical music. The tempered scale, piano, various types of viola-like folk instruments etc etc. AI-generators is just a new tool.
I have used Music AI for what you describe, by uploading a track I have made myself and use diffusion on it with various genres, although I don't think that is the most interesting use case for Music AI.
Fusion is the most interesting use case IMHO.
I think many folks making the "music technology" argument tend to blur things so that human composition is reduced to "just another dial, just another new tech aid for music", "get off my lawn" etc. But its interesting to note that generative AI is unique in that it needs massive amounts of music that has already been produced to train on. Its a very unique development with some interesting questions once you sharpen the picture a bit.
Let's not forget that extraordinary musical talent tends to be the result of MASSIVE exposure to music as a toddler. It isn't entirely a question of DNA or effort. They start by emulating what they hear, over time they might evolve their own style, or they may not. In other words, you are less likely to become an extraordinary musical talent if you weren't saturated with music before you started going to school. Nobody finds it problematic when a musical star is performing way beyond their age, even though he/she essentially is engaging in deliberate mimicry to a much larger extent than an AI usually would be.
Are you saying that it is wrong for a machine to learn? Even if it was, it is the massive amount it is based on that makes it less problematic, as that makes room for building abstracted knowledge.
I like to think of mozart and the amount of music he listened to and was influenced by and compare it to what (I imagine) is used by these machines. Heck, you don't even instruct the machines on scales, rhythms or other sorts of fundamental musical aspects, you just tell the machines to make noise that sounds like what they've hoovered up. I'm still convinced that AI music proponents want to blur and obscure the differences between human composers and machine "composers" but I personally think the differences are pretty interesting and worthy of consideration.
My best tracks are extended in 10-20 second sections. And I carefully choose the prompt and what to keep and where to continue (crop), sometimes just a few seconds. So you can compose with these models, basically, you have to if you want to get something that isn't bland.
So overall, yes, anyone can, with some luck create a "radio friendly" track with music AI, but not everyone can create good art with it. You still need to understand music, and the more you know about genres and composers and how they fit together the more power you have to create musically interesting patterns.
Keep in mind that Sunauto has 3400 generic tags + an undisclosed amount of artist names. You need to understand what makes it possible to combine different Artists/Musicians and what could lead to something interesting.
So musical knowledge is still needed.
I am certain that AI music can produce something, but that training data needs to be figured out.
This is a baseless argument. A human being and a machine are fundamentally different. A human being synthesizes things from the world and it becomes a part of them. The art they produce is filtered through their unique brain and life experience and natural talents. All a machine does is spit out recycled versions of things that actual human beings have done.
Cointerintuitively we need to build the base model before we can do the advanced production stuff that’s most useful to existing pro musicians.
[flagged]
I feel like I've proven I know what I'm talking about at least from an ML perspective given we trained the model on the website. It's not strictly required, but I know you'll get much higher quality more powerful tools much more easily by building on top of a solid music foundation model than training a bunch of one-off specialized tools.
The transition btw two songs demo is super cool! I often need to do this when editing videos but used to have no way to do it.
Not to mention that now you can have playlists that transition seamlessly btw two songs. Low-cost party DJ?
I'm familiar with video and image diffusion model architectures, but know almost nothing about music models.
Are there any good papers or writeups on them?
Are there any open source implementations to play with?
There are!
Audio models are actually quite similar to image models, but there are a few key differences. First, is the autoencoder needs to be designed much more carefully as human hearing is insanely good and music requires orders of magnitude more spatial compression (image AEs do 8X8 downsampling, audio AEs need to do thousands of times downsampling). Second the model itself needs to be really good at placing lyrics/beats (similar to placing text in image diffusion): a sixth finger in an image model is fine, but a missed beat can ruin a song. That's why language model approaches (which have a stronger sequential inductive bias than diffusion models which is good for rhythm and lyric placement) have been really popular in audio.
If you're interested in papers (IMO not good for new people as they make everything seem more complicated than it is):
Stable Audio (similar to our architecture): https://arxiv.org/abs/2402.04825 (code: https://github.com/Stability-AI/stable-audio-tools)
MusicGen (Suno-style architecture): https://arxiv.org/abs/2306.05284 (code: https://github.com/facebookresearch/audiocraft/tree/main)
I'm not going to comment on the technical side of things, which is way beyond my technical comprehensions skills, and I'm sure it required a considerable amount of brain, time and energy to reach similar results.
But music production and distribution is (actually, was) my home turf, so here's my two cents on the topic:
I've already heard music qualitatively on par with the tracks available on your demo page. I've heard it way more than I truly wanted or felt it was necessary, at least once a day while tracking on pro tools hundreds of albums you've never ever heard of, in studios in France and LA, for years.
It was made with people with the best intentions, coming from all sorts of walks of life, and yet it was obvious from the first note they played that they were condemned to the oblivion, their music destined to be basically never heard by anyone.
And this has been done every day, multiple times a day, in every studio around the world, since the '60s.
20% of Spotify music has never been played once. IIRC less than 40% has been played more than once.
There's a genuinely humbling scene in the 2002 documentary "Scratch" where DJ Shadow, a world-renowed DJ and producer, wades trough stacks of EPs out of a record store in NY that have never, ever been played once[1], which perfectly captures how little of the musical output being recorded we actually get to listen to.
Making music is very easy. Making music people want to listen to is hard, mind-bogglingly so. For every whitebread pop track you've heard on the radio, there's thousands of other similar tracks that have been discarded by an A&R, a radio DJ, some label, or simply by the audience.
I'm saying this with no ill feelings towards you or your work, but I can't concieve even the flimsiest of reasons why anyone would ever listen to (or license/sync/track/ ) any of those generated songs once the novelty of "music made by the AI" is gone.
[1]https://www.youtube.com/watch?v=1gpKYnRdf0A&t=6s
> I can't concieve even the flimsiest of reasons why anyone would ever listen to (or license/sync/track/ ) any of those generated songs once the novelty of "music made by the AI" is gone.
Easy: Independent/single-dev operations needing some quick background music for a project (game, whatever)
This is already easily solved by using royalty-free music or by licensing pre-made music from numerous publicly available sound libraries online -- with the added benefit of supporting actual musicians instead of plagiarist tech middlemen.
I think you got this wrong. Usually you use one electronic musician to create all the background music (or license pre-existing popular music), and with music AI you make that musician more productive. It is not like non-musicians will even be able to select the bad from the good of the AI output. It takes a trained ear to build functional AI music as well.
Nobody will hire live studio musicians or a symphony orchestra to create background music. Way too expensive.
What exactly am I getting wrong? You insisting that nobody hires studio musicians or orchestras, and claiming that "usually you hire one electronic musician" are both demonstrably false opinions that have almost no relation to my point that background music is obtainable through on-demand licensed libraries.
Also: People (especially kids, students, etc) who want to make music but don't have the technical expertise to (yet?).
Obviously these tools don't do everything necessary to make great music, but the barrier of entry to making music is being lowered, and the quality floor is being raised -- and that'll result in a lot more would-be "musicians"[0] creating music that wouldn't otherwise exist[1].
[0] I leave the argument of whether these generative musicians count as "real" musicians to the Scotsmen in the audience.
[1] Bonus question: does art still hold value if no one sees it?
o1's take on your bonus question seems reasonable :
Yes. Art can have intrinsic and personal value for its creator, independent of any external audience. Unseen art lacks immediate external value [to others] but retains latent worth, potentially realized when discovered or appreciated in the future.
The creator sees/hears it! (and if they don't it really shouldn't have been generated lol, waste of compute)
This comes up quite a lot - lowering the barrier to entry for creating a bunch of media that wouldn't otherwise exist.
Eh I don't think the world is exactly clamoring for even more music.
I can't speak for everyone's process, but if you don't know how to make music, I'm not convinced that this allows you to do so because the medium of input (aka writing text) is far too divergent from the resultant melodic output to allow for any kind of meaningful individuality.
> Also: People (especially kids, students, etc) who want to make music but don't have the technical expertise to (yet?).
But fuck all the people who have a career teaching them those skills as a part of a thousands-year long artistic tradition whose value isn't solely defined by the exchange of currency for lessons, but in that it subsidizes those artists' work which goes unpaid and furthers human experience.
It's wonderful techies with a surface level knowledge of the arts are cannibalizing the entire supply chain and marketplace so they can make a buck off the AI craze.
The barrier to entry is already zero. AI lowers the ceiling, not the floor.
> I'm saying this with no ill feelings towards you or your work.
I can. It’s predatory behavior, performed by people looking to steal and cash in on something they have neither the skill, understanding, or love to make on their own.
> Making music is very easy. Making music people want to listen to is hard, mind-bogglingly so.
Agreed, although making music is "easy" once you've put in the hours to learn how to make music, and getting to somewhat professional standards requires a lot of time investment.
This reduces that to zero.
The point where it becomes a "problem" is the people abusing it to pump out hundreds (or dare I say thousands) of bullshit filler "music" to get some stream income at the expense of people who have put in the effort.
Technical skill has very, very little to do with the emotional impact of music.
One of the most famous mantras in punk music was "this is a chord, this is another, now make a band"[1]. "Imagine" by John Lennon is a song written using the simplest scale and chord progressions, using a very low 4/4 tempo.
The hard part is not knowing the biggest amount of chords, it's knowing what not to use to carry your emotions.
Also, the "time investment" is the music. Once the final waveform hits the tape/DAW/recorder it's not art anymore, it's publishing.
And for most artists "learn to make music" is usually the fun part. Complicated and frustrating sometimes, but rewarding. For a lot of them, the "now play it in front of other people" part is the truly annoying one, frightening in some occasions.
[1]https://austinkleon.com/2019/01/13/this-is-a-chord-this-is-a...
The benefit of AI generated music is that you can make it for yourself. The goal shouldn't be to get other people to listen to it. It's very personal and that should be the end goal.
Is it very personal if it was generated from a third party? Making music and, more importantly, playing it is an incredibly personal, physical experience. Especially if you're doing it for yourself rather than being a gun-for-hire session musician.
Getting it from a service like this is the equivalent of buying an already assembled lego kit. Putting aside ethical concerns (we're talking about the music industry after all, there were none even before AI arrived) is there a viable business in it?
The benefit of real human-made music is that you can make it for yourself. The goal shouldn't be to get other people to listen to it. It's very personal and that should be the end goal.
Can humans generate a song based on custom lyrics and style in a matter of minutes?
Whose Line Is It Anyway?
https://www.youtube.com/watch?v=XwIsvKpEgOA
I have mixed feelings on AI music. When i make music, it's relaxing, it's fun, maybe 3 or 4 people listen to it. Then it kinda just sits on Soundcloud. One of my partners did enjoy my music though...
AI music for the sake of just having it in the background removes that human element. It's just more stuff. To be fair, like you said, making generic music isn't anything new. But everything is turning into this. Games are using AI generated music which isn't , by definition, able to try anything new, AI art which is just reguritated by other artists.
The enshitification of Spotify is here. Why pay artist 100$ for 500k plays when you can just push AI music and pay 1$ for every 500k plays. As is music( really any entertainment) is a horrible way to make money.
So I guess I'll just keep working on my beats, with Maschine( the only software that keeps me on Windows!), and sharing them with a few people every now and then.
Why is it "music for developers"? I was expecting one of those Lofi music videos designed to enhance concentration or similar. These are typically instrumentals, ostensibly because they are less distracting, something like this:
https://www.youtube.com/watch?v=M5QY2_8704o
it's because the thing we're launching today is an API for developers to use. If you want instrumental type stuff you should check out my bossa nova channel: https://sonauto.ai/radio
Okay. I know these guys IRL. BUT, I genuinely think they have the best music model out there. Hands down. The songs are just more unique, and have a wider range of musical variation. With Suno/Udio, the songs just sounds the same after a while (just with different lyrics).
That could just be me though. I am curious what users of Udio/Suno think?
Quality has improved so much too, I tried it a few months ago at Demo Day and I’m blown away by how good it is now.
By far the best use case here is generating "Weird Al" style parody covers of pop songs by just changing the lyrics. Songs that everyone knows but with custom lyrics are way more interesting than songs nobody has heard before generated at random.
i've been occasionally trying to get some usable ambience tracks from these various models, but none of them seem to be able to produce looping tracks
based on results so far it also looks like more flexible approach to ai generation would be to generate set of stems/samples based on user description and let them to actually compose instead of producing complete audio (maybe this is already happening somewhere)
- in either case, properly looped tracks will be most likely necessary to be produced by these models at some point
Creating looping sections with Sunauto is generally not very difficult as the model is overly consistent when extending, so all you have to do is extend without changing the prompt and you should get a loop point.
If you have lyrics in your conditioning then you usually can use a [Chorus] section as your loop point as most music AI models render chorus section in away that affords crossfades. THAT ASSUMES that you keep the chorus sections with a gap less than about 60 seconds with 30 seconds exention as it can only see 90 seconds of the music track (or something in that range)
- no lyrics and no chorus- i mentioned ambience track - these are instrumental, often minimalistic they don't attract attention...
> Creating looping sections with Sunauto is generally not very difficult as the model is overly consistent when extending
the point is not consistent extension, but consistent extension (if any) plus starting audio
have you actually did this and got usable looped track ?
You can create a looped track by combining the song generation and transition generation examples from our API example repo!
would you be so kind and
- link to both example
- briefly outline process of combining them for this usecase
- since this is API usage i suppose it's most likely not viable to obtain (reasonably) looped audio via user prompting (i.e. what's currently available in browser) - would that be correct ?
Congrats on the API launch (from SkyPilot)!
Thanks! We used SkyPilot (an open source cloud GPU worker management tool) to help out with both our small (single node) and large (many node) training runs.
how did you create this without committing grand theft musica
The first 80s song I heard was a literal copy of Phil Collins. But there are no emotions attached to it (for me), and the lyrics are random. It’s more like supermarket background music IMHO, not something I would pay for, especially when we have centuries of music to discover already, why make fake stuff like that?
Edit: I have just heard the funniest most ridiculous metal song ever without a touch of metal inside. Breathe of Death, it’s like a bad joke.
If thats the future of anything, I’m going back to plain C (code) when I retire and I’ll never approach the internet ever again.
In my opinion training on all music is no more theft than Taylor Swift listening to the radio growing up (as long as we don't regurgitate existing songs which would be bad and useless anyway). I think an alternative legal interpretation where all of humanity's musical knowledge and history are controlled by three megacorporations (UMG/Sony/Warner) would be kinda depressing. If the above is true we might as well shutdown OpenAI and delete all LLM weights while we're at it, losing massive value to humanity.
It’s intellectual property laundering. A company selling a button that launders the blood sweat and tears of generations of artists is not the same as a person being inspired and dedicating themselves to mastery.
Humans create value. AI consumes and commoditizes that value, stealing it from the people and selling it back to their customers.
It’s unethical and will be detrimental in the long run. All profit should be distributed to all artists in the training set.
It won't be detriment to consumers who ultimately decide the value. If I could AI gen a better tasting cocacola for cheaper that would be beneficial to consumers and coke wouldn't deserve a cut. Get gud, as they say.
> In my opinion training on all music is no more theft than Taylor Swift listening to the radio growing up (as long as we don't regurgitate existing songs which would be bad and useless anyway).
I beg of you, speak to some real life musicians. A human composing or improvising is not choosing notes based on a set of probabilities derived from all the music they’ve heard in their life.
> I think an alternative legal interpretation where all of humanity's musical knowledge and history are controlled by three megacorporations (UMG/Sony/Warner) would be kinda depressing.
Your impoverished worldview of music as an artistic endeavor is depressing. Humanity’s musical knowledge extends far beyond the big 3.
> If the above is true we might as well shutdown OpenAI and delete all LLM weights while we're at it
Now we’re talking.
> losing massive value to humanity.
Nothing of value would be lost. In fact it would refund massive value to humanity that was stolen by generative AI.
The difference being that a musician being influenced by other musicians still has to work to develop the skills necessary to distill those influences into a final product, and colors that output with their own subjective experiences and taste. This feels like a conveniently naive interpretation to justify stealing artists' work and using it to create derivative generative slop. The final line in your comment is pretty telling of how seriously you take this issue (which is near-universally decried by artists) -- some other massive company is doing a bad thing, so why shouldn't I?
edit: I have to add how disingenuous I find calling out corporations owning "all of humanity's musical knowledge and history" as if generative AI music trained on unlicensed work from artists is somehow a moral good. At least the contracts artists make with these corporations are consensual and have the potential to yield the artist some benefit which is more than you can say for these gen-AI music apps.
I don't see how the amount of work that went into it changes the core fact that all art is influenced by that which came before, and we don't call that stealing (unless you truly believe that "all art is theft").
My point re: LLMs wasn't meant to exclusively be a "they're doing it" one, the hope was to give an example of something many people would agree is super useful and valuable (I work much faster and learned so much more in college thanks to LLMs) that would be impossible in the proposed strict interpretation of copyright.
edit responding to your edit:
Re: moral good: I think that bringing the sum of human musical knowledge to anybody who cares to try for free is a moral good. Music production software costs >$200 and studios cost thousands and majoring in music costs hundreds of thousands, but we can make getting started so much easier.
Is it really consent for those artists signing to labels when only three companies have total control of all music consumption and production for the mass market? To be clear, artists absolutely have a right to benefit from reproduction of their recordings. I just don't think anyone should have rights to the knowledge built into those creations since in most cases it wasn't theirs to begin with (if their right to this knowledge were affirmed, every new song someone creates could hypothetically have a konga line of lawyer teams clamoring for "their cut" of that chord progression/instrument sample/effect/lyrical theme/style).
I think we intuitively allow for artists to derive and interpolate from their influences because of a baseline understanding that A) it is impossible to create art without influence and B) that there is an inherent value in a human creating art and expressing themselves. How that relates to someone using unlicensed music from actual humans to train an AI model in order to profit off of the collective work of thousands of actual human artists, I have no idea.
edit:
> I think that bringing the sum of human musical knowledge to anybody who cares to try for free is a moral good
Generative AI music isn't in any way accomplishing this goal. A free Spotify account with ads accomplishes this goal -- being able to generate a passable tune using a mish-mash of existing human works isn't bringing musical knowledge to the masses, it's just enabling end users to entertain themselves and you to profit from that.
> Is it really consent for those artists signing to labels
Yes? Ignoring the fact that there are independent labels outside the ownership of the Big Three you mention, artists enter into contracts with labels consensually because of the benefits the label can offer them. You train your model on these artists' output without their consent, credit or notification, profit off of it and offer nothing in return to the artists.
A) Agreed! B) So I guess the argument here is that this doesn't apply to AI music. I think that if someone really pours their soul into the lyrics of a song and regenerates/experiments with prompts until it's just right, and maybe even contributes a melody or starting point that's still a human creating art and expressing themselves. It's definitely not as difficult as creating a song from scratch, but I've been told similar arguments were made regarding whether photography was art when that became a thing.
btw, if the user of the AI doesn't do any of the above then I think the US copyright office says it can't be copyrighted in the first place (so no profiting for them anyway).
> if the user of the AI doesn't do any of the above then I think the US copyright office says it can't be copyrighted in the first place (so no profiting for them anyway).
Am I understanding right that the point here is that while you are able to get away with using copyrighted material to turn a profit, your end users cannot, so no worries?
I think there are a few fallacies at play here:
1. Anthropomorphizing the kind of “influence” and “learning” these tools are doing, which is quite unrelated to the human process
2. Underrepresenting the massive differences in scale when comparing the human process of learning vs. the massive data centers training the AI models
3. Ignoring that this isn’t just about influence, it’s about the fact that the models would not exist at all, if not for the work of the artists it was trained on
> Is it really consent for those artists signing to labels when only three companies have total control of all music consumption and production for the mass market?
This premise is false. I have made plenty of money busking on the street, for example. Or selling audio recordings at shows.
> {o be clear, artists absolutely have a right to benefit from reproduction of their recordings.
This is correct. Artists benefit when you pay them for the right to reproduce. When you don't (like what you are doing), you get sued. Here's a YouTube video covering 9 examples:
https://www.youtube.com/watch?v=IIVSt8Y1zeQ
> I just don't think anyone should have rights to the knowledge built into those creations since in most cases it wasn't theirs to begin with
What?
> I have made plenty of money busking on the street
That's why I specified mass market. However, given a choice between literally being on the street and working with a record label I'd probably choose the label, though I don't know about others.
> pay them for the right to reproduce
My point is learning patterns/styles does not equate to reproducing their recordings. If someone wants to listen to "Hey Jude" they cannot do so with our model, they must go to Spotify. There are cases where models from our competitors were trained for too long on too small a dataset and were able to recite songs, but that's a bug they admit is wrong and are fighting against, not a feature.
> in most cases it wasn't theirs to begin with
In most cases they did not invent the chord progression they're using or instruments they're playing or style they're using or even the lyrical themes they're singing. All are based on what came before and the musicians that come after them are able to use any new knowledge they contribute freely. It's all a fork of a fork of a fork of a fork, and if everyone along the line decided they were entitled to a cut we'd have disaster.
Law should be considered to be artificial rules optimized for the collective good of society.
What's the worst that can happen if we allow unregulated AI training on existing music? Musician as a job won't exist anymore lest for the greatest artists. But it makes creating music much more accessible to billions of people. Are they good music? Let the market decide. And people still make music because the creative process is enjoyable.
The animus towards AI generated music deeply stems from job security. I work in software and I see it is more likely that AI can be eventually able to replace software devs. I may lose my job if that happens. But I don't care. Find another career. Humanity needs to progress instead of stagnating for the sake of a few interest groups.
I don't work as a musician so it's nothing to do with job security -- I think that using artists' output without their consent in order to train a soulless AI model for some tech middleman to profit from is repugnant, and the cheap rhetoric about democratizing music and "bringing music to the masses!" adds insult to injury. I can guarantee if OP's intellectual property was violated in this project, like somebody ripping off their model or trademark, they'd be suing, but they conveniently handwave away mass scale IP theft when it comes to musicians.
I’m skeptical about how much value AI art is going to really contribute to humanity but as a lifelong opponent of copyright I have to roll my eyes when I see people arguing against it on behalf of real artists, all of whom are thieves in the best case and imitators in the worst.
Yeah every musician has a story of writing a new song, bringing it to the band, and they say "oh, this sounds just like [song]." It's almost impossible to make something truly novel.
> almost impossible to make something truly novel
But beyond the originality !== novelty discussion, I'm not sure how we've come to equate 'creativity' (and the rights to retaining it) to a sort of fingerprint encoding one's work. As if a band, artist or creator should stick to a certain brand once invented, and we can sufficiently capture that brand in dense legalese or increasingly, stylistic prompts.
How many of today's artists just 'riffing' off existing motifs will remain, if the end result of their creative endeavours will be absorbed into generative tools in some manner? What's the incentive for indies to distribute digitally, beyond the guarantee their works will provide the (auditory) fingerprints for the next content generation system?
I have written and performed many songs over many bands. At no point did anybody compare my work to any other artist's work, because it is genuinely unique.
This is you, right?
https://cmahoney.bandcamp.com/track/deck-the-halls
Citation needed. Where can I hear some of your work?
Let's hear it.
The problem is that techbro corporates trying to make megabucks of profit off of using other people's art.
Intellectual property laws for thee but not for me, I guess.
Megacorporations owning copyrights to the majority of IPs(music, games, etc.) is a capitalism/monopoly problem. How does getting rid of copyright and allowing your company to profit off other peoples work in any way solve that issue?
no one can actually explain the value OpenAI adds to humanity. What massive loss? What have we gained from this entity other than another billionaire riding a hype cycle?
These high-quality music models require pirating many, many terabytes of music. Torrents are the main way to do it, but they likely scraped sites like Bandcamp, Soundcloud and YouTube.
AI music is a weird business model. They hope that there's enough money peddling music slop after paying off the labels (and maybe eventually the independent music platforms) whose music you stole. Meanwhile, not even Spotify can figure out how to be reliably profitable, serving music people want to hear.
The movie Electric Dreams is now the most prescient '80s movie about gen AI so far; An architect builds an approximation engine on his home PC which then ingests the whole internet/TV and learns to compose music for the cute girl next door who plays the cello. Song mis-attribution is the central theme. The hit song from the movie Together In Electric Dreams is actually from the perspective of the gen AI choosing to self-destruct as a final show of his love. https://www.youtube.com/watch?v=kDV-_q-iaK8
I bring it up only to provide a bit of balance to the soulless slop debate, proving creators can have diverging opinions on what is good in music creation and life—they don't all feel threatened by poor substitutes no one can possibly enjoy.
This is super cool! Thanks for the hard work you've clearly put into this.
My dream product in this space (...that I didn't know existed until I discovered your site about 10 minutes ago LOL):
I listen to music when I work/code, and I used to loooooove Spotify Playlist Radio (a feature the reason for which they killed I will never understand) because it helped me discover new music in the style of music I already enjoyed working to. Liked a song? Add it to the seed list and click play to fine-tune the radio station.
So what I really want is just a fine-tuneable infinite stream of novel music to work to. And by fine-tuneable, I mean I'd love to be able to nudge the generation (Pandora style) with thumbs ups / thumbs downs, or other more specific guidance/feedback (more bass, faster tempo, etc.) until I have this perfectly crafted, customized-for-me stream of music.
I'd probably listen to it all day and happily pay $$ for this.
Is this a pipe dream?
I am with you. I want this too. Maybe somebody can make it wit their API?
how is this better or different from suno besides api? I'm assuming since you are smaller the quality is not as good and the depth not as wide.
Suno's RVQ-token-based language model is tuned give you an acceptable song that most of their userbase would prefer every single time, but isn't very diverse. Our diffusion model is much more diverse and has higher vocal audio quality, but the results aren't always consistent (just like Flux et al). However, since we have unlimited generations this can be worked around. We're also never going to preference tune our model because I think the stuff that is lost in that process is valuable.
I use both. Sonauto sounds more "real" and varied than what I can get with suno
Not related to this post, but I was wondering about AI music generators and I don't have experience with their capabilities. The ones I know seem catered to making entire songs.
I was having a discussion with a friend who writes a lot of guitar music but can also play bass and sing. However, getting good drums is a problem. What he'd like is a service to upload his songs in some form (just guitar, or a mixed version with bass and vocals) and get an output that layers a drum track without altering the input. Ideally with appropriate fills, etc. I mean, just getting an in-time drum stem would probably be even better.
Is there any GenAI service to do this kind of incremental additive drums?
There's work in that area, it's sometimes called "accompaniment generation."
https://arxiv.org/abs/2301.12662
https://fastsag.github.io/
Not sure about GenAI, but Logic Pro has the ability to add a Session Drummer which can be set to track a given bass stem and produce passable drums for a song.
Creating music is the most rewarding thing I’ve found in life, and I can’t wrap my head around why anyone would want to automate that away.
Less of this, more robots they do my dishes please.
We aren't trying to force musicians to stop doing what they love, we're trying to give everyone else a taste of the fun of making music.
You need an OpenApi spec!
Honestly pretty cool. I'm curious how easy it will be for different video platforms and editors to work it in as a feature or maybe plugin
ah sweet man-made horrors beyond comprehension
What is the point of generating this low quality AI slop music, what real use case do you have in mind?
For the consumer stuff: It's fun, and IMO that's enough. Not every song has to be peak artistic quality pushing the world forward, sometimes it's enough to bring a smile to a friend's face by making a song about them. If you think their art is slop you shouldn't have to listen to it (IMO Spotify et al should have an optional "no AI music" filter for now).
For the API: I think this could be integrated into artists workflows in lots of ways we can't even imagine right now as it gets better. One example I gave above was generating transitions between songs.
the reason a song from a friend makes you happy is directly related to the effort behind it, this is totally meaningless.
You know that post card they sent you is not actually a photoshop-free picture taken by them. They just addressed it to, you should totally dump them from your life.
obtaining, addressing, and writing a message in a post card is the effort, not the picture.
The reason anything makes anyone happy is completely subjective, as evidenced by the many people who have told us our app made them and/or their friends and family happy.
that's just your opinion.
I made little gift songs for friends for awhile. It was nice and fun. Making a roadtrip theme song for friends on a vacation is way fun, and kinda locks in the moment
I also used it when I was living in New Orleans to help a friend come up with a riff for a live set he had, which had some unusual constraints (only had a singer, drummer and trombone, but no others, in an echoey space). He used the generated song hook as inspiration for that nights' arrangement
There's lots of stuff, and song of it supports artists who have tight timelines and want creative support
There's so much real independent music out there that actually has meaning. I hope you didn't tell your friend you wrote the song, because if someone tricked me into listening to generated not-art and I found out afterwards, I would consider them a liar.
What your friend did, using generation for inspiration for real music he creates is fine. But if someone gifted me an AI generated song I would ask why they didn't pay a few dollars -- honestly not much more -- to a real artist to do the same.
Ten years ago a friend of mine did that, hired a real person, and it cost less than $20 to write a ditty. That's comparable to the cost in tokens for an AI except you could support a real human artist instead of megalomaniac Yarvinists Sam Altman and friends.
And the song would have real meaning. You gave your friend a non-gift. The Let Me Google That For You of gifts. Honestly if one of my friends did that I'd wonder if they even like me.
> But if someone gifted me an AI generated song I would ask why they didn't pay a few dollars -- honestly not much more -- to a real artist to do the same.
I literally made a Mountain Goats song about them playing a fantasy video game together with their daughter as we all sat on the couch. This did not rob any artist of any amount of money they ever would have seen. It was a novel moment accentuated and joyful for humans at zero cost to anyone else. The creative world is not zero-sum like you're presenting it to be
The problem with AI music, and in fact AI in general, is that weve spent the last few decades aggressively attacking the idea that art should get paid for at all and yet people still do it, because they love it. So musicians work for pennies, and yet people still need to replace them with a machine.
So even if you just pay someone else to make you a song, its not really any more expensive than this. Same with painting. What does this AI bring to the table, at all? It grosses me out.
People on this site should go pick up a guitar and write a 3 chord song about someone, itll take you a day if that. Its not hard! Its fun!
The problem with real music, is that it requires a hefty amount of musicians to establish a genre. This amount could be somewhere in the range of 100 to 1000 musicians.
When this critical number is not amassed then the genre effectively dies.
With A.I. we can resurrect dead genres, but not only that, we can combine genres together, popular genres with one another, also popular and unpopular genres or popular and dead genres.
Using A.I. for music is easier and much faster than traditional means, and this could greatly reduce the critical mass of musicians to support a genre. It could be reduced as much as 10 times, or 100 times, like one person creating 10000 songs or something similar.
By trying to compare A.I. music to traditional music, you are comparing 10 songs a real band makes, with 10000 songs an A.I. (human) musician makes. It's apples and oranges comparison.
I don't see why human music cannot be a genre, the best of all genres but just one, and an innumerable amount of A.I. genres which may not be so good, but they are infinite.
The real human music genre might be the best forever or just for the next 3 years, but so what? Let there be more genres some good some bad. No one is gonna listen to a cheap copy of an already existing song of an already existing genre, but songs already in existence should be used to train A.I. weights.
Regarding A.I. weights, smaller models forget much of the information they are trained on, and they are cheaper, faster and easier to be fine-tuned, also probably easier to apply RL reasoning on. In that way, A.I. musicians (or real musicians) could run the model in their computers and use it as an instrument instead of relying in companies with big models, slow and expensive.
And some times big and inefficient models copy text/code/music verbatim from the training data. But this is a bug, when small models become competitive enough, most people are gonna use those. They might even carry them around, like a personal band always ready to make melodies for them.
I’m a pretty big music fan and I have no idea what you’re on about. Where did you get this theory?
> The problem with real music, is that it requires a hefty amount of musicians to establish a genre.
Why is establishing genre a goal in the first place?
> This amount could be somewhere in the range of 100 to 1000 musicians.
This is demonstrably false. Genre is defined by critical consensus, and it can arise around one or a handful of bands.
> With A.I. we can resurrect dead genres
What dead genre are you after? I’d imagine there are folk styles that haven’t been kept alive, but I question whether AI recreations would satisfy anyone. I’d rather listen to authentic recordings instead. And if the genre doesn’t have a significant recorded catalog, you can’t train a generative AI to produce it anyway.
Yeah, it seems like a pretty contrived example and theory.
I think what the OP is trying to articulate is that they are aware of more genres now? Maybe AI makes exploration of niche genres more exciting and participatory for them. They are finding new genres and "expanding them", but it's just bc they were ignorant of them (or unengaged with the content of the song style) before they could participate in this way. I dunno, just trying to think what they might have experienced that would make them think some new universal was coming true * shrug *
>Why is establishing genre a goal in the first place?
What is culture, if not a common agreement on what is beautiful and ugly? Establishing a genre in music is not a goal, but we see it happen over and over again. It is how humans operate since forever, we mimic one another in fashion, in music and many other things.
> Genre is defined by critical consensus, and it can arise around one or a handful of bands.
It arises around a handful of bands, but if it doesn't grow past 5 bands let's say, we are talking about 50 songs in total every year. Who listens nowadays to only 50 songs per year?
> And if the genre doesn’t have a significant recorded catalog, you can’t train a generative AI to produce it anyway.
Yes you can. Synthetic data generation is a big thing already and tens of millions of dollars are poured into it every year.
I’m not sure if you misunderstand genre, or the way humans relate to music, or both? Culture is not genre. I don’t know anyone who listens to only the current year’s output from a single genre.
I haven’t done the analysis, but consider someone who listens to pop radio: if one new song per week makes it into heavy rotation I’d say that sounds like the right ballpark.
Personally I’d be ecstatic if there were 50 worthwhile new songs to listen to each year.
I understand synthetic data. I question whether anyone will accept the results.
Some kind of Dadaist movement I guess. Listen to Breathe of Death, it’s hilarious and then you cry.
What is slop music?
What makes these tracks "slop"?
https://sonauto.ai/song/942c0122-9358-4805-96c4-eb0537d97ca2
https://sonauto.ai/song/ba87e490-19f0-47c5-acb0-83a3b57a90e9
https://sonauto.ai/song/2b217436-0876-49bc-8a9c-d5807e626962
Signed up with gmail, and get 'Generation Failed' with every attempt. Please dont email me or add me to your marketing list.
There was a single unhealthy worker that didn't get caught, we just killed it.
Without disclosing your training data, this should be considered piracy and removed from HN.
What rule is there against discussing piracy on HN?
This is promotion of piracy and trying to get the community to engage and validate it - if you can’t see the line then maybe take the time to learn that it exists and how to recognize it.
The courts haven't decided whether or not this is even considered piracy or fair use, besides there are courts all over the world...
What piracy?
Copyright does not cover ideas, generic patterns, timbres etc.
If you know music history then will know that classical composers borrowed heavily. Without it, we would still be at the stone age level.
Since these algorithms aren't sentient, I'd say they shouldn't have the same rights and obligations as us. Have you heard of fair playing field?
This situation can easily spiral out of control in a way that you end up with an oligarchy in music, where AI captures most of the attention, since it will be backed by those with most means to shove it in your face.
So yes, this is piracy. Hopefully the law will catch onto the ethics.
[dead]
[flagged]
[flagged]
Over the years I've seen people get a lot of hate for things they've poured their souls into who turn around and post snarky/insulting responses that ended up getting them into even more hot water. I always wondered why they didn't respond with their clearly well thought out reasoning behind what they built instead of the snark, even if the original comment wasn't in good faith either. I understand them a little better now.
It's def valid to ask about the value of projects like this, but I think "Please delete this project, as you are actively making the world worse." isn't the right way to start that discussion if that was your intent. I also detailed my thoughts about the whole industry a little further down so I'll avoid duplicating that.
You know, most people go through their lives without attracting this kind of criticism. They might laugh or call you foolish, but to consistently tell you you’re actively making the world a worse place? That’s reserved for a special class of assholes. I’d take the feedback seriously.
i agree more times over than I can count. Its pointless, borderline offensive, will not enrich anyone, and makes us all worse off.