Handy – Free open-source speech-to-text app written in Rust

mathverse · 2025-09-28T11:33:38 1759059218

Even being in Tauri this application just by doing these things takes around 120MB on my M3 Max. It's truly astonishing how modern desktop apps are essentially doing nothing and yet consume so much resources.

- it sets icon on the menubar - it display a window where I can choose which model to use

That's it. 120MB FOR doing nothing.

3oil3 · 2025-09-28T12:37:28 1759063048

I feel the same astonishment! Our computers surely are today faster and stronger and smaller than yesterdays', but did this really translate in something tangible for a user? I feel that besides boot-up, thanks to SSDs rather than gigaHertz, it's not any faster. It's like, all this extra power is used to the maximum, for good and bad reasons, but not focused on making 'it' faster. I get a bit puzzled to why my mac could freeze half a second when I 'cmd+a' in some 1000+ files-full folder.

Why doesn't Excel appear instantly, and why is it 2.29GB now when Excel 98 for Mac was.. 154.31MB? Why is a LAN transfer between two computers still as slow as 1999, 10ishMB/s, when both can simultaneously download at > 100MB/s? I'm not starting with GB-memory-hoarding tabs, when you think about it, it's managed well as a whole, holding 700+ tabs without complaining.

And what about logs? This is a new branch of philosophy, open Console and witness the era of hyperreal siloxal, where computational potential expands asymptotically while user experience flatlines into philosophical absurdity?

dreamcompiler · 2025-09-28T16:13:14 1759075994

It me takes longer to install a large Mac program from the .dmg than it takes to download it in the first place. My internet connection is fairly slow and my disk is an SSD. The only hypothesis that makes sense to me is that MacOS is still riddled with O[n] or even O[n^2] algorithms that have never been improved and this incompetence has been made less visible by ever-faster hardware.

A piece of evidence supporting this hypothesis: rsync (a program written by people who know their craft) on MacOS does essentially the same job as Time Machine, but the former is orders of magnitude faster than the latter.

dundarious · 2025-09-28T19:53:23 1759089203

You can make this app yourself in an hour if you're on Linux and can do some scripting. Mockup below for illustration, but this is the beating heart of a real script:

  # whisper-live.sh: run once and it listens (blocking), run again and it stops listening.
  if ! test -f whisper.quit ; then
    touch whisper.quit
    notify-send -a whisper "listening"
    m="/usr/share/whisper.cpp-model-tiny.en-q5_1/ggml-tiny.en-q5_1.bin"
    txt="$(ffmpeg -hide_banner -loglevel -8 -f pulse -i default -f wav pipe:1 < whisper.quit \
      | whisper-cli -np -m "$m" -f - -otxt -sns 2>/dev/null \
      | tr \\n " " | sed -e 's/^\s*//' -e 's/\s\s*$//')"
    rm -f whisper.quit
    notify-send -a whisper "done listening"
    printf %s "$txt" | wtype -
  else
    printf %s q > whisper.quit
  fi

You can trivially modify it to use wl-copy to copy to clipboard instead, if you prefer that over immediately sending the text to the current window. I set up sway to run a script like this on $mod+Shift+w so it can be done one-handed -- not push to listen, but the script itself toggles listen state on each invocation, so push once to start, again to stop.

alok-g · 2025-09-28T19:32:33 1759087953

The tech industry has such inefficiencies nearly everywhere. There's no good explanation why an AI model that knows so much could be smaller than a typical OS installation.

I could once optimize a solution to produce over 500x improvement. I cannot write about how this came, but it was much easier than initially expected.

See also: Wirth's Law: https://en.wikipedia.org/wiki/Wirth%27s_law

Leftium · 2025-09-29T15:49:01 1759160941

It's a matter of trade-offs.

In theory, Handy could be developed by hand-rolling assembly. Maybe even binary machine code.

- It would probably be much faster, smaller and use less memory. But...

- It would probably not be cross-platform (Handy works on Linux, MacOS, and Windows)

- It would probably take years or decades to develop (Handy was developed by a single dev in single digit months for the initial version)

- It would probably be more difficult to maintain. Instead of re-using general purpose libraries and frameworks, it would all be custom code with the single purpose of supporting Handy.

- Also, Handy uses an LLM for transcription. LLM's are known to require a lot of RAM to perform well. So most of the RAM is probably being used by the transcription model. An LLM is basically a large auto-complete, so you need a lot of RAM to store all the mappings to inputs and outputs. So the hand-rolled assembly version could still use a lot of RAM...

sipjca · 2025-09-28T13:57:50 1759067870

A lot of the bloat comes from dependencies like ONNX or whisper.cpp to accelerate running the model itself

While the UI is doing “nothing” most of the bloat is not from the UI

mathverse · 2025-09-28T14:06:20 1759068380

But do you start onnx and whisper.cpp on fresh install / start? I did nothing. I literally just installed the app and started it without selectin a model.

sipjca · 2025-09-28T15:16:04 1759072564

Oh interesting. I totally misread the original comment, I didn't realize you're talking about RAM usage. 120MB is quite a lot. This surprises me too. There's nothing fancy going on really until the model is chosen.

mathverse · 2025-09-28T15:43:00 1759074180

Yeah exactly. Tbh Tauri is touted as more lightweight than Electron but I have never seen a Tauri application that lived up to this claim.

daakus · 2025-09-28T06:29:27 1759040967

Shameless plug: A brutally minimalist Linux only, whisper.cpp only app: https://github.com/daaku/whispy

I wanted speech-to-text in arbitrary applications on my Linux laptop, and I realized that loading the model was one of the slowest parts. So a daemon process, which triggers recording on/off using SIGUSR2, records using `pw-record` and passes the data to a loaded whisper model, which finally types the text using `ydotool` turned out to be a relatively simple application to build. ~200 lines in Go, or ~150 in Rust (check history for Rust version).

efskap · 2025-09-28T06:40:12 1759041612

I'm very curious about the rewrite. Was Rust slowing you down too much?

daakus · 2025-09-28T07:22:07 1759044127

Just for fun. I like both languages. I thought Rust would be better fit on account of interop with whisper.cpp, but turns out the use of cgo was straight forward in this case. I like that the Go version has minimal 3rd party dependencies compared to the Rust version.

DoctorOW · 2025-09-28T13:15:08 1759065308

Why Linux only? Isn't Go and Whisper.cpp cross platform?

daakus · 2025-09-28T14:32:32 1759069952

It relies on `pw-record` for recording audio and `ydotool` for triggering keyboard input. These are Linux specific. I don't know about Windows, but on my Mac I have a not-yet-public Swift + whisper + CoreAudio + Accessibility based solution that provides similar functionality.

atoav · 2025-09-28T15:43:07 1759074187

That was my guess. Crossplatform Audio input isn't exactly as trivial as using pipewire.

sidhusmart · 2025-10-10T18:55:14 1760122514

I love this tool. Been using this for the past 2 weeks and it works great. Struggles a bit in noisy settings but it's weird talking to your computer in a coffee shop anyways :P

b_e_n_t_o_n · 2025-09-28T00:42:09 1759020129

Why does the title specify the language used when it's not even mentioned on the home page?

01HNNWZ0MV43FF · 2025-09-28T05:30:40 1759037440

If it's Rust or Go it means I won't have to fuss with a runtime like Python or JS, nor a C++ build system

mathverse · 2025-09-28T11:34:56 1759059296

You dont have that using Electron app as well. The runtime is bundled with the binary.

raffraffraff · 2025-09-28T13:52:45 1759067565

Honestly the only thing I avoid these days is python. If something is written in python I generally give it a miss, especially if it has a GUI.

Leftium · 2025-09-28T05:44:15 1759038255

I just copied the title verbatim from the original Show HN: https://hw.leftium.com/#/item/44302416

In case you also have a problem with not using the original HN link: https://news.ycombinator.com/item?id=44302416

(I think the first link is easier to read (CSS/formatting/dark mode), slightly more compact, and contains a link to the original HN post. It's also simple to recreate the HN link manually by inspecting the ID.)

nicce · 2025-09-28T01:23:34 1759022614

Marketing. Honestly, might not be good here since it is not library and not completely written in Rust.

ajsnigrutin · 2025-09-28T01:53:43 1759024423

Marketing for what exactly?

I mean... why would I want this app instead of some other app? Just because it's written in the language of the week? If it said "20% faster than xyz" it would be a much better marketing than saying it's written in rust, even though more than half the code is typescript.

leoedin · 2025-09-28T08:55:10 1759049710

I think there are tangible benefits to this being “not Java or JavaScript”. Or any language that brings a resource intensive runtime with it.

nicce · 2025-09-28T11:46:00 1759059960

More than half is TypeScript to be fair.

quicklime · 2025-09-28T05:28:36 1759037316

The title also mentions that it’s open source, so it could be marketing for potential contributors.

sipjca · 2025-09-28T14:57:07 1759071427

It's primarily this. I'm a novice Rust developer and really would like to improve the code quality across the board, and some of this comes to attracting the right kind of developers to help. Maybe "Rust" in the title helps, maybe it doesn't. Clearly HN doesn't like it and that's okay.

I stated my need for help on the about page as well

> This is my first Rust project, and it shows. There are bugs, rough edges, and architectural decisions that could be better. I’m documenting the known issues openly because I want everyone to understand what they are getting into, and encourage improvement in the project.

nicce · 2025-09-28T15:12:41 1759072361

> Maybe "Rust" in the title helps, maybe it doesn't. Clearly HN doesn't like it and that's okay

HN definitely likes it, when it is used in the correct context. Using Rust in the title is a soft promise for better reliability and quality for the software than on average. But it starts to get controversial when Rust is not purely the controlling part of the software anymore. So people start to complain because it can be misleading marketing which is based on the promise that Rust can offer.

sipjca · 2025-09-28T15:18:41 1759072721

Fair enough, most of the critical code in this case is written in Rust. A Rust transcription library popped out of the project `transcription-rs`. And there is a real-time audio library I'd like to put out which allows for filters. I could have called out to ffmpeg or similar, but I chose to implement an audio pipeline myself (for better or worse)

So makes sense, but there are benefits to writing a desktop application backend in Rust for the ecosystem as well.

ktosobcy · 2025-09-28T10:52:05 1759056725

I'm not sure if it's purely down to "hype".

For me I do tend to prefer apps written in rust/go(/c/etc-copiled) as they are usually less problematic to install (quie often single binary; less headache compared to python stuff for example) and most of the time less resource hungry (anything JS/electron based)... in the end "convenient shortcut to convey aforementioned benefits" :)

tiberriver256 · 2025-09-28T02:10:12 1759025412

It's targeting a very specific group of devs who like to follow trendy stuff..

To that group saying something is "made in rust" is equivalent to saying "it's modern, fast, secure, and made by an expert programmer not some plebe who can't keep up with the times"

hamandcheese · 2025-09-28T03:48:00 1759031280

> and made by an expert programmer

Quite the opposite. You have to be more of an expert programmer to achieve those same goals in C. Rust lowers the skill bar.

Anyways, I agree that the editorialization here is silly.

But also, I am unashamed that "in Rust" does increase my interest in a piece of software, for several of the reasons you mentioned.

Leftium · 2025-09-27T20:35:01 1759005301

Read the creator's description in the original Show HN: https://hw.leftium.com/#/item/44302416

tempodox · 2025-09-28T07:54:41 1759046081

I love it.

How do you clear the history of recordings?

sipjca · 2025-09-28T13:59:14 1759067954

Next version will have it! The main branch already has I’ve just not released the next version yet

Leftium · 2025-09-28T07:57:45 1759046265

I don't think it's possible (yet), but only the last five recordings are stored.

kwar13 · 2025-09-28T08:51:21 1759049481

Nicely done! Seeing that it uses a port of Whisper, here's my shameless plug for a gnome extension I made using Whisper:

https://extensions.gnome.org/extension/8238/gnome-speech2tex...

m13rar · 2025-09-28T01:08:53 1759021733

Awesome . I was looking to build this on my own. Will look at the code and consider contributing cheers.

sipjca · 2025-09-28T01:44:06 1759023846

Hey author of Handy here! Would absolutely love any help, please let me know if there's any way I can make contributing easier!

rammer · 2025-09-28T13:55:24 1759067724

Hi mate is there a way to make this persistent, so I can give a long dictation instead of holding down the space bar all the time?

sipjca · 2025-09-28T14:06:36 1759068396

Yes! Turn off “push to talk”, it will activate when you click the shortcut and stop when you click it again

efskap · 2025-09-28T05:13:10 1759036390

Cool, you just might've saved me some carpal tunnel in the long run xD.

I guess there's no way for the AppImage to use GPU compute, right? Not that it matters much because parakeet is fast enough on CPU anyway.

Leftium · 2025-09-28T05:36:40 1759037800

I think the Whisper models will all use GPU. Only the Parakeet model is limited to CPU.

(I'm unfamiliar with AppImage. Was the model included in the app image, or was there a download after selecting the model?)

mamonoleechi · 2025-09-28T10:59:15 1759057155

not sure this might help, but when you launch the .appimage in a terminal, it shows you the command to extract the files it contains (to speed the loading) ; this might help you find the files you're searching for, maybe :)

sipjca · 2025-09-28T14:00:13 1759068013

Whisper uses Vulkan and Metal acceleration with whisper.cpp

Parakeet is currently CPU only

jonahx · 2025-09-28T00:06:32 1759017992

How good will this local model be compared to, say, your iphone builtin STT?

dcre · 2025-09-28T01:10:58 1759021858

It’s way better. iPhone’s is awful. On macOS, interestingly, the built in dictation seems a bit better than on iOS, but still not as good as Whisper and Parakeet. Worth noting I have never used Whisper Small, only large and turbo. Another comment says Parakeet is the default now, though, despite what the site says.

sipjca · 2025-09-28T01:53:52 1759024432

Author here!

The default recommendation is Parakeet (mainly because it runs fast on a lot more hardware), but definitely think people should experiment with different models and see what is best for them. Personally I found Whisper Medium to be far better than Turbo and Large for my speech, and Parakeet is about on par with Medium, but each have their own quirks.

I'll update the site soon!

dcre · 2025-09-28T05:51:46 1759038706

That's really interesting about medium being better than large. I never bothered trying the smaller models since the big ones were fast enough.

sipjca · 2025-09-28T14:07:25 1759068445

Benchmarks definitely say otherwise, but my anecdotal experience says medium is the best for this application with my voice and microphone

primaprashant · 2025-09-28T07:15:59 1759043759

built something similar for terminal lovers. It's a CLI tool built in Python called hns [1] and uses faster-whisper for completely local speech-to-text. It automatically copies the transcription to the clipboard as well as writes to stdout so you seamlessly paste the transcription in any other application or pipe/redirect it to other programs/files.

[1]: https://github.com/primaprashant/hns

hu3 · 2025-09-27T22:33:41 1759012421

Very cool. Uses whiper small uder the hood.

https://github.com/openai/whisper

geor9e · 2025-09-27T23:42:13 1759016533

nvidia parakeet v3 was the default out of the box and it works surprisingly well

it offers all the different sizes of openai models too

precompute · 2025-09-28T07:43:03 1759045383

This is local, but I've found that external inference is fast enough, as long as you're okay with the possible lack of privacy. My PC isn't beefy enough to really run whisper locally without impacting my workflow, so I use Groq via a shell script. It records until I tell it to stop, then it either copies it to the clipboard or writes it into the last position the cursor was in.

sipjca · 2025-09-28T14:14:34 1759068874

What computer are you using? You really should give Parakeet a try, I find it runs in a few hundred milliseconds even on a Skylake i5 from 10 years ago.

NaomiLehman · 2025-09-28T16:11:40 1759075900

just a heads up. There are many more accurate and faster models than Whisper nowadays. https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

sipjca · 2025-09-29T14:39:24 1759156764

It also uses one of the fastest and most accurate on the ASR leaderboard, Parakeet.

vladstudio · 2025-09-28T06:55:19 1759042519

+1, happy user and a humble contributor.

sipjca · 2025-09-28T15:00:58 1759071658

You're awesome Vlad!

skeptrune · 2025-09-28T09:05:43 1759050343

Amazing! I have been desperately wanting this. Livecaptions doesn't seem to be maintained super well.

ranger_danger · 2025-09-27T22:18:57 1759011537

Anyone know of the opposite? A really easy-to-use text-to-speech program that is cross-platform?

geor9e · 2025-09-27T23:43:37 1759016617

I've tried a lot of them, and the best I found so far is Edge browsers built in microsoft (natural) voices, which I call via javascript or the browsers read aloud function.

yoavm · 2025-09-28T00:34:58 1759019698

Checkout https://github.com/rany2/edge-tts , which exposes it as a Python library and a CLI tool.

derekja · 2025-09-28T05:35:34 1759037734

I’ve been enjoying Kokoro

Amazing what it can do with only 82M parameters

https://www.kokorotts.io/

sipjca · 2025-09-28T15:02:37 1759071757

Curious your use case, I now have quite a lot of experience with releasing desktop apps, and I have done some accessibility work as well, and may be curious to put together a TTS toolkit as well into a desktop app (or even Handy)

dundarious · 2025-09-28T21:27:44 1759094864

piper's amy voice is pleasant enough to me for reading articles, and it's instantaneous and trivial to use, just download the binary and model file.

ranger_danger · 2025-09-28T23:12:44 1759101164

Wow, this is much faster and higher quality than the meloTTS program I was using before, and has many more voices available... although it doesn't appear to support Japanese.

Thank you!

jszymborski · 2025-09-27T22:57:16 1759013836

I've used Speech Note, which works well for STT and TTS.

ompogUe · 2025-09-28T01:55:50 1759024550

Been having fun with this one

https://addons.mozilla.org/en-CA/firefox/addon/read-aloud/

Read Aloud allows you to select from a variety of text-to-speech voices, including those provided natively by the browser, as well as by text-to-speech cloud service providers such as Google Wavenet, Amazon Polly, IBM Watson, and Microsoft. Some of the cloud-based voices may require additional in-app purchase to enable.

...

the shortcut keys ALT-P, ALT-O, ALT-Comma, and ALT-Period can be used to Play/Pause, Stop, Rewind, and Forward, respectively.

oulipo2 · 2025-09-28T07:38:29 1759045109

Nice! There's also the VoiceInk open-source project https://github.com/Beingpax/VoiceInk/

atmanactive · 2025-09-28T11:42:50 1759059770

MacOS only.

majorchord · 2025-09-27T22:20:12 1759011612

TypeScript 53.9% Rust 44.9%

FYI

yoavm · 2025-09-28T00:41:41 1759020101

The README is very clear about it:

Frontend: React + TypeScript with Tailwind CSS for the settings UI Backend: Rust for system integration, audio processing, and ML inference

typpilol · 2025-09-27T22:25:13 1759011913

Lmao. At least it's typescript and not JavaScript!

shakabrah · 2025-09-27T23:38:13 1759016293

Who’s gonna tell him?

nicce · 2025-09-28T01:27:32 1759022852

Yeah. Rust compiles to machine code.

typpilol · 2025-09-28T03:22:19 1759029739

I thought it was a clever joke

loloquwowndueo · 2025-09-28T00:23:20 1759019000

Don’t you dare!

areeba_iqbal · 2025-09-27T22:16:47 1759011407

That's great, nice to see more and more projects of Machine learning being written in rust

dcre · 2025-09-28T01:11:40 1759021900

It’s not really a machine learning project. It’s an application that calls existing models.

amelius · 2025-09-28T08:59:25 1759049965

Repo says:

CPU-optimized speech recognition with Parakeet models

dcre · 2025-09-28T14:08:16 1759068496

I understand that it uses ML models. My point is that it is an end-user application making use of such models. It is recording audio, passing it to the model, and pasting in the resulting text to the focused input. The fact that the middle step happens to involve an ML model is not really intrinsic to anything the app does. If there was a good speech to text program that did not use ML, the app could use that instead and not really be any different.

sipjca · 2025-09-28T14:13:36 1759068816

To be fair on the other side there is a fair lack of specific ML inference libraries in Rust, and this project is pushing some of that forward with Parakeet at the very least. The Rust library `transcribe-rs` came from it and hopefully will support more models in the future.

While certainly it's not an ML project in the sense of I am not training models, the inference stack is just as important. The fact is the application does do inference using ONNX and Whisper.cpp.

ajsnigrutin · 2025-09-28T01:52:20 1759024340

More than half the code is typescript.

sipjca · 2025-09-28T02:13:38 1759025618

It's typescript because it is a Tauri app which uses the system webview to render the UI.

Most of the audio code/inference code is Rust or bindings to libraries like whisper.cpp

amelius · 2025-09-28T09:01:48 1759050108

How handy is this for coding? ;)

thelittleone · 2025-09-28T10:07:19 1759054039

Is it able to isolate the speaker from background noises / voices?

sipjca · 2025-09-28T15:00:37 1759071637

Right now there is fairly minimal processing done to the audio. There is a VAD filter to reduce the non-speech areas. But there is no noise-reduction as such. The audio pipeline could support it though, so if you know any good real time noise reduction filters let me know. Would love to improve the SNR into the models

rgbrgb · 2025-09-28T00:34:51 1759019691

this is a great landing page. I downloaded.

great onboarding too, using it now.

Very handy, thanks!

ashu1461 · 2025-09-28T06:31:24 1759041084

Landing page is indeed very refreshing

sipjca · 2025-09-28T14:13:47 1759068827

thank you!!!

perfmode · 2025-09-28T00:48:50 1759020530

how’s it differ from macos dictation?

geor9e · 2025-10-03T02:27:57 1759458477

Just compare them side by side. On one side, the dictation tech baked into you OS, the other side transformer models like Whisper Large or Parakeet. Mumble from across the room from the mic. The difference is staggering.

dcre · 2025-09-28T01:09:50 1759021790

I find state of the art speech to text models like Whisper and Nvidia Parakeet are a lot better than macOS dictation. I use them through MacWhisper, but this is basically the same.

roscas · 2025-09-27T22:01:41 1759010501

It downloads the model at first execution and also checks versions in github.

That is ok for what is brings. Nice program. Very "handy".

sipjca · 2025-09-28T01:50:50 1759024250

If you prefer a more stripped down version: the original releases (0.1.0 and 0.1.1) shipped with Whisper tiny included and no auto-update feature

mzimbres · 2025-09-28T09:26:25 1759051585

[flagged]

amelius · 2025-09-28T10:37:58 1759055878

How can I call this library from C++?