TL;DR

ACE-Step 1.5 is an open-source music generation model that matches commercial services like Suno v4.5 in quality while running locally on consumer hardware. Generate a full 4-minute song in under 10 seconds on an RTX 3090, with VRAM requirements as low as 4GB.

Why it matters: AI music generation has been locked behind subscriptions ($10-30/month for Suno, Udio). ACE-Step 1.5 changes that—MIT-licensed, fully local, with no usage limits. Train custom styles from just 8 songs in under an hour.

Who it’s for: Musicians seeking AI-assisted ideation, content creators needing custom background music, developers building music-related applications, and anyone tired of paying for AI music subscriptions.


The Open-Source Music Generation Revolution

Until now, if you wanted high-quality AI-generated music, you had two choices: pay Suno $10-30/month or use Udio’s limited free tier. Both are cloud-based, both have usage restrictions, and both keep your creations on their servers.

ACE-Step 1.5 (released February 3, 2026) breaks this model wide open. It’s the first open-source music generation model that genuinely competes with commercial offerings—and in some benchmarks, it sits between Suno v4.5 and Suno v5 in quality.

The repository hit 2,200+ stars within 3 days of release. The original ACE-Step has 3,900+ stars. The community reception has been… enthusiastic.

“This is absolutely nuts, and I love the separation of concerns in the architecture. It opens up a lot of possibilities. Fantastic work!!” — r/LocalLLaMA


What Makes ACE-Step 1.5 Special

Blazing Fast Generation

Speed comparisons tell the story:

Model4-Minute SongNotes
ACE-Step 1.5 (A100)2 seconds10-120× faster
ACE-Step 1.5 (RTX 3090)~10 secondsConsumer GPU
Most commercial services2-4 minutesCloud-based
Some competitors20+ secondsOpen-source

On an RTX 4070 Super, a 2-minute song takes about 2 minutes to generate. Still faster than uploading, waiting, and downloading from cloud services.

Runs on Consumer Hardware

The VRAM requirements are surprisingly low:

  • Minimum: Less than 4GB VRAM (basic generation)
  • Recommended: 8GB for full-length songs
  • LoRA training: 12GB (one hour for custom style)

Tested GPUs: RTX 4060, RTX 3090, RTX 4070 Super, A100. Works on AMD ROCm and even CPU/Apple Silicon (slower).

Commercial-Grade Quality

Benchmarks from the technical paper show ACE-Step 1.5 scoring between Suno v4.5 and Suno v5 on standard evaluation metrics. Style alignment and lyric adherence are strong. The model supports:

  • 1000+ instruments and styles with fine-grained timbre control
  • 50+ languages for lyrics
  • 10 seconds to 10 minutes of audio
  • Batch generation of up to 8 songs simultaneously

Beyond Basic Generation

ACE-Step 1.5 isn’t just text-to-music. It supports:

  • Reference audio input — Guide generation with existing tracks
  • Cover generation — Create covers from audio
  • Repaint & edit — Selective local audio editing
  • Track separation — Split audio into stems
  • Multi-track layering — Add layers like Suno Studio
  • Vocal-to-BGM — Generate accompaniment for vocals
  • LoRA training — Personal style in 8 songs, 1 hour

The Architecture: Why It’s Fast AND Good

ACE-Step 1.5 uses a novel hybrid approach:

  1. Language Model (LM) acts as an “omni-capable planner” that transforms simple prompts into comprehensive song blueprints. It handles structure, metadata, and lyrics via Chain-of-Thought reasoning.

  2. Diffusion Transformer (DiT) generates the actual audio from the blueprint. This is where the speed comes from—efficient diffusion in latent space.

  3. Intrinsic Reinforcement Learning aligns the components without external reward models, avoiding biases from human preference datasets.

Model variants:

  • acestep-v15-turbo — Fast generation (default)
  • acestep-5Hz-lm-0.6B — Smaller LM, faster
  • acestep-5Hz-lm-1.7B — Larger LM, better prompt adherence

Self-Hosting ACE-Step 1.5

Quick Start (5 Minutes)

The fastest way to get running:

# Install uv package manager (if not installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone the repo
git clone https://github.com/ACE-Step/ACE-Step-1.5.git
cd ACE-Step-1.5

# Install dependencies
uv sync

# Launch the Gradio Web UI
uv run acestep

Open http://localhost:7860 in your browser. Models download automatically on first run (~6GB total).

Windows One-Click Install

For Windows users, there’s a pre-built portable package:

  1. Download ACE-Step-1.5.7z
  2. Extract anywhere
  3. Run start_gradio_ui.bat

Includes Python and all dependencies. Requires CUDA 12.8.

REST API Server

For programmatic access:

uv run acestep-api
# API available at http://localhost:8001

Or enable API alongside the web UI:

uv run acestep --enable-api --api-key sk-your-key --port 8001

Docker (via ComfyUI)

The community has created a ComfyUI integration for workflow-based music generation:

# Inside ComfyUI custom_nodes folder
git clone https://github.com/billwuhao/ComfyUI_ACE-Step

Supports LoRA, repainting, remixing, and audio-to-audio workflows.

AMD GPU Support

ACE-Step runs on AMD GPUs via ROCm, but requires a workaround:

# Activate your venv first (important!)
source .venv/bin/activate

# Run directly instead of via uv
python -m acestep.acestep_v15_pipeline --server-name 127.0.0.1 --port 7680

Configuration Deep Dive

Command Line Options

# Public access (network-accessible)
uv run acestep --server-name 0.0.0.0 --share

# Change language (en, zh, ja)
uv run acestep --language zh

# Pre-load models on startup
uv run acestep --init_service true

# Use the larger LM for better prompts
uv run acestep --lm_model_path acestep-5Hz-lm-1.7B

# Force CPU offload for low VRAM (<16GB auto-enabled)
uv run acestep --offload_to_cpu true

# Add authentication
uv run acestep --auth-username admin --auth-password secret

Environment Variables

Create a .env file for persistent config:

ACESTEP_INIT_LLM=true           # Force-enable LLM
ACESTEP_CONFIG_PATH=acestep-v15-turbo
ACESTEP_LM_MODEL_PATH=acestep-5Hz-lm-1.7B
ACESTEP_DOWNLOAD_SOURCE=huggingface  # or modelscope
ACESTEP_API_KEY=sk-your-secret-key

VRAM Optimization

For GPUs with limited VRAM:

# Auto-enables if VRAM < 16GB
uv run acestep --offload_to_cpu true

# Use smaller LM (0.6B vs 1.7B)
uv run acestep --lm_model_path acestep-5Hz-lm-0.6B

# Disable LLM entirely (DiT-only mode)
uv run acestep --init_llm false

Prompt Engineering for Music

ACE-Step 1.5 responds well to detailed, descriptive prompts. Here are examples from the acemusic.ai playground:

Neo-Soul Jazz

A smooth neo-soul instrumental built on a relaxed groove from an upright 
bass with light brushwork on the drums and warm electric piano chords. 
A soulful alto saxophone enters to play a memorable, lyrical melody that 
serves as the main theme. The arrangement is spacious and clean, allowing 
each instrument room to breathe. Following the main section, there's an 
extended improvisational passage featuring expressive saxophone runs.

Synthwave

An energetic, driving synthwave track propelled by a punchy four-on-the-floor 
drum machine beat and a pulsing synth bassline. A bright, arpeggiated synth 
lead carries the main melodic hook, weaving through atmospheric synth pads 
that provide harmonic depth. The arrangement builds dynamically, introducing 
new synth layers and filter sweeps to maintain momentum.

Anime J-Rock Theme

An explosive j-rock anthem driven by crunchy, overdriven electric guitars 
playing powerful riffs and chords. A punchy acoustic drum kit lays down 
an energetic 4/4 beat with crashing cymbals, locked in with a solid bassline. 
The track is fronted by a powerful female lead vocal performance, delivered 
with clarity, strength, and conviction typical of anime theme songs.

Lo-Fi Hip-Hop

A melancholic and atmospheric hip-hop track built on a foundation of a clean, 
arpeggiated piano melody and a deep, resonant sub-bass. The beat is sparse, 
lo-fi hip-hop groove with a soft kick and a snappy snare. A male vocalist 
delivers lyrics with delay to create a distant, introspective feel.

Tips for Better Results

  1. Describe instruments specifically — “warm electric piano” beats “piano”
  2. Include arrangement details — verse/chorus structure, builds, breakdowns
  3. Specify the vibe — “melancholic,” “energetic,” “dreamy”
  4. Mention production style — “lo-fi,” “polished,” “raw”
  5. Use the LLM’s query rewriting — Let it expand simple prompts

Training Custom Styles (LoRA)

One of ACE-Step 1.5’s killer features: train a personalized style from just 8 songs in about an hour on a 12GB GPU.

The Gradio UI includes one-click annotation and training. The workflow:

  1. Upload 8+ reference songs in your target style
  2. The model auto-annotates BPM, key, and captions
  3. Click train — wait ~1 hour on RTX 3090
  4. Load your LoRA and generate in your custom style

This opens possibilities for:

  • Personal signature sounds
  • Brand-specific audio
  • Genre-specific fine-tuning
  • Artist style approximation (use responsibly)

🎵 Listen: Generated Examples

Head to acemusic.ai/playground/trending to hear what ACE-Step 1.5 can produce. Trending tracks include:

  • “Ember Swing” — Neo-soul jazz, 3:41
  • “Dreamwave Drive” — Synthwave, 3:03
  • “I’m happy” — Korean indie rock, 2:00
  • “Echoes of the past” — Anthemic hip-hop, 2:10

All generated by ACE-Step 1.5 with the prompts visible on each track.


Community Reactions & Honest Limitations

What People Love

  • “An open-source model with quality approaching Suno v4.5/v5… running locally on a potato GPU. No subscriptions. No API.”
  • The architecture’s separation of concerns opens customization possibilities
  • LoRA training democratizes personalized music AI
  • Speed is genuinely impressive for the quality level

Known Limitations

Not everyone is fully satisfied:

  • Prompt adherence — Some users report electronic genres don’t match expectations
  • Mastering quality — “Sounds like loudness war era music” — could use better mastering
  • Genre specificity — May not understand niche electronic subgenres well
  • Coherence — Some outputs lack long-term musical coherence

Consensus: Excellent for rapid prototyping and ideation. May not replace professional production pipelines, but it’s getting close.


Comparison: ACE-Step vs Alternatives

FeatureACE-Step 1.5Suno v4.5UdioDiffRhythm
Open Source✅ MIT
Local/Self-Hosted
QualityGood-GreatGreatGoodMedium
Speed (4-min song)2s-10s2-4 min1-2 min~10s
Min VRAM4GBN/AN/A8GB
LoRA Training
PriceFree$10-30/moFree tierFree
Languages50+50+LimitedLimited

Use Cases

For Musicians

  • Ideation: Generate 10 variations of a concept in minutes
  • Demo creation: Quick backing tracks for songwriting
  • Style exploration: Try genres outside your comfort zone
  • LoRA training: Capture your signature sound

For Content Creators

  • Background music: Custom tracks for videos, no licensing issues
  • Podcast intros: Unique, on-brand audio
  • Game prototyping: Placeholder soundtracks that might become final

For Developers

  • API integration: Build music features into apps
  • Workflow automation: Generate audio programmatically
  • Custom interfaces: Build on top of the REST API

Quick Reference

# Install
git clone https://github.com/ACE-Step/ACE-Step-1.5.git
cd ACE-Step-1.5 && uv sync

# Run Web UI
uv run acestep

# Run API server
uv run acestep-api

# With authentication
uv run acestep --enable-api --api-key sk-xxx --auth-username admin --auth-password secret

# Low VRAM mode
uv run acestep --offload_to_cpu true --lm_model_path acestep-5Hz-lm-0.6B


Final Thoughts

ACE-Step 1.5 represents a genuine inflection point for open-source AI music. It’s not perfect—prompt adherence and mastering quality have room to improve—but it’s good enough to be genuinely useful, and it’s fast enough to integrate into creative workflows.

For the first time, you can run Suno-quality music generation on your own hardware, with no subscriptions, no usage limits, and full control over your outputs. Train custom styles in an hour. Build it into your applications via API.

The future of AI music just went local, and it’s MIT-licensed.