TongFlow Team

An Open-Source Multi-Modal GenAI
Workflow Studio

Every AI model is a node on an infinite canvas. Wire modalities, combine results — self-host or use our cloud. Open source. Multi-modal. Runs anywhere.

View all downloads See demos

macOS · Apple Silicon · macOS · Intel · Windows · All platforms & versions

open source AGPL-3.0 plugin ecosystem node-based canvas text · image · video · audio · 3D

Add → Transform → Combine

One canvas. Every modality.

No parameter panels. No manual wiring. Three operations — add materials, transform between modalities, combine the results.

Add

Drop any material onto the canvas: text, images, audio, video, documents, URLs, or 3D models. Everything becomes a node.

Transform

Text→image, image→video, audio→text — every AI model is a modality transform, exposed as a swappable node. Switch models without rewiring.

Combine

Image fusion, lip sync, voice cloning, character swap, motion transfer — wire multiple inputs into one output. Built in, not bolted on.

Local-first, always

Workflows and uploaded files live on your own machine. No account required. No telemetry. Your data never leaves without your say-so.

Bring your own keys

Text runs on OpenRouter, Gemini, OpenAI, or DeepSeek — your key, your choice. GPU inference runs on Modal (free tier included).

Real models. Named.

Z-Image, FLUX.2 Klein 9B, LTX-2, SeedVR2, Qwen3, ACE-Step — every model doing actual work is listed in the README, not hidden behind a product name.

Add

Drop any material onto the canvas: text, images, audio, video, documents, URLs, or 3D models. Everything becomes a node.

Transform

Text→image, image→video, audio→text — every AI model is a modality transform, exposed as a swappable node. Switch models without rewiring.

Combine

Image fusion, lip sync, voice cloning, character swap, motion transfer — wire multiple inputs into one output. Built in, not bolted on.

Local-first, always

Workflows and uploaded files live on your own machine. No account required. No telemetry. Your data never leaves without your say-so.

Bring your own keys

Text runs on OpenRouter, Gemini, OpenAI, or DeepSeek — your key, your choice. GPU inference runs on Modal (free tier included).

Real models. Named.

Z-Image, FLUX.2 Klein 9B, LTX-2, SeedVR2, Qwen3, ACE-Step — every model doing actual work is listed in the README, not hidden behind a product name.

What's shipped today

Pulled directly from the README. If a row is here, it works today.

Add: 11 input types

Text, image, photo, sketch, audio file, audio recording, video file, video recording, document, URL, 3D model — drop any material onto the canvas.

Transform: Image

Image generation, image editing (inpaint/redraw), image understanding (captions/Q&A), image upscaling.

Transform: Video

Text-to-video, image-to-video, first/last-frame extraction, video understanding, video upscaling.

Transform: Audio

Music generation, speech synthesis (preset / voice clone / instruction), speech recognition.

Transform: Text

Generate or rewrite copy from a prompt — routed through OpenRouter, Gemini, OpenAI, or DeepSeek depending on the node's model slot.

Combine

Image fusion (multi-reference blending), lip sync (audio+video / audio+image / audio+text → video), voice cloning, character swap, motion transfer, text merging.

Helpers

Concatenate clips, mux audio+video, split by shots, demux, extract audio track, split long text, merge text blocks, filter clips, batch arrange groups.

Bridges

Document → text, URL → text — bring outside material into the canvas.

Backend & Models

FFmpeg for media pipelines, Modal for GPU workers. Models shipping today: Z-Image, FLUX.2 Klein 9B, LTX-2, SeedVR2, InfiniteTalk, Wan-Animate, ACE-Step, Qwen3, Whisper, Gemini, OpenAI, OpenRouter.

FAQ

Straight answers

What TongFlow is. What it isn't.

Is this really open source?

Yes. AGPL-3.0. Full source at github.com/tong-io/tongflow — read it, fork it, self-host it. The cloud at app.tongflow.com runs the same code.

What's the difference between the cloud and self-hosting?

Same codebase, different setup cost. The cloud is up in seconds with no configuration. Self-hosting gives you full control: your API keys, your files, no account, nothing external. Both are first-class options.

Do I need a GPU?

Not locally. Heavy inference runs on Modal — their free tier includes real H100 time. You bring a Modal token and at least one LLM API key (OpenRouter, Gemini, OpenAI, or DeepSeek). TongFlow itself runs fine on a laptop.

How is this different from ComfyUI or n8n?

ComfyUI is built for image generation. n8n is built for API orchestration. TongFlow treats all seven modalities — text, image, video, audio, speech, music, 3D — as first-class. Combine nodes (lip sync, image fusion, motion transfer) are built in, not third-party extensions.

How do I self-host?

git clone https://github.com/tong-io/tongflow && cd tongflow && pnpm install && pnpm dev. You need Node.js 20+, a Modal token (free tier works), and one LLM API key. The README covers everything else.

Can I build my own plugins?

Yes. Define a slot in the ABI, write a Python function decorated with @node_slot, publish it as a package. Any backend works — Modal, Replicate, a local GPU, or a plain API. See the SDK docs.

What stage is this?

Shipped June 2026. Early days. Contributions, bug reports, and model integrations are very welcome. Discord and GitHub issues are the right places.

Three ways in

Download the desktop app, run it on our cloud, or self-host from source — same open-source code, your call.

Desktop

Free app for macOS & Windows. Download and open.

Download →

Self-Host


git clone https://github.com/tong-io/tongflow

cd tongflow && pnpm install && pnpm dev

View on GitHub →

Cloud

Same open-source codebase. No setup. Running in seconds.

Try Cloud →

An Open-Source Multi-Modal GenAIWorkflow Studio

One canvas. Every modality.

Add

Transform

Combine

Local-first, always

Bring your own keys

Real models. Named.

Add

Transform

Combine

Local-first, always

Bring your own keys

Real models. Named.

What's shipped today

Straight answers

Is this really open source?

What's the difference between the cloud and self-hosting?

Do I need a GPU?

How is this different from ComfyUI or n8n?

How do I self-host?

Can I build my own plugins?

What stage is this?

Three ways in

An Open-Source Multi-Modal GenAI
Workflow Studio