Private a.k.a. non Cloud Voice Assistant

Hi,

OK so I have to admit that I am a bit anti-cloud these days. Privacy is a big concern especially in the home and I don’t like sending my voice data to some cloud via “she who must not be named” (Alexa!).

Besides that I need to be able to use a Smarthome server in my Campervan and I don’t always have access to the cloud.

Having looked into it (I am no expert); there seems to be a few options…

The most obvious is Mycroft because it is the most mature open source VA out there but…alas they do send data to Google (even though they say it is anonymous). So, that is off the list.

That leaves a few choices - Rhasspy, Sepia, Jaco-Assistant and Picovoice.

I have taken Jaco-Assistant off the list because although it looks promising - I think that it needs to mature more before I spend time using it in my projects.

Picovoice is a professional offering with a free personal edition and SDK, but as far as I can tell it is limited to 1000 voice transactions per month. I suspect that is more that enough “turn the lights on/off” for anybody but I don’t like being limited - so that is off my list!

Sepia seems promising and has an interesting method to integrate with any Smarthome hub. I like that it interfaces with the Smarthome app and then provides a way to manage the interactions via cards (in the Sepia app). It is a sensible approach but I would prefer to manage the integration within the smarthome solution - still worth checking out if you are interested.

Finally - my preferred solution; Rhasspy. This is a project based on the SNIPS voice assistant that was bought out by Sonos a couple of years ago, and so has a level of maturity already. The development is active and is progressing well. It has the advantage of being MQTT based and provides a wide range of speech models making it highly configurable.

I have seen (but not used) Rhasspy plugins for FHEM and Jeedom, however, because the speech intent data is sent via an MQTT API - it is quite flexible to integrate it via scripts.

I have already been able to turn lights on/off in nymea using Rhasspy and have it confirm this via text to speech also using Rhasspy TTS models. This is work in progress and very rough (mainly because I have just started to learn to program and am only about 10% proficient in Javascript!). Still it works (but is hampered by lack of access to nymea metadata). I will report back on progress for those who are interested. Any comments, help or suggestions (polite ones please!) are always welcome :slight_smile:

Rgeards

1 Like

Hey @3more ,

pretty nice research you did there. Especially the Rhasspy project seems very interesting.
You’re welcome to share the steps you took to set this up.
I suppose right now you’re using the MQTT client plugin and the script engine to wire up things. That seems a good approach to start playing with it, eventually I’d like to see such a thing fully integrated in nymea. I’ll try to find some time to jump on board and see what we can get out of this.

Thanks @mzanetti

I have been playing around with smart home stuff during lockdown - so had already looked at a few different voice assistants. By the way Rhasspy integrates with Home Assistant - but then so does everything! Also, Rhasspy has a sister project called Voice2Json which is basically a cut down version of Rhasspy for use at the command line - so may be a candidate for integration.

I think I read on here that nymea has a VA integration in the pipeline - so I will watch that with interest!.

It has just been a bit of fun seeing what can be done and will post a bit more later today. The script requires that any ‘Thing’ that you want to control with VA has to have a ThingState entry - so not really viable long term especially if you have a lot of Things!

Rhasspy intent data is Json so the device to control can be dynamic but needs to reference a Thing in nymea - this is done via tags in HA, Sepia, Openhab, Fhem etc. If a Thing was tagged with say “frontRoomFloorLamp” then this could be used to dynamically create a ThingState - if that was possible (I don’t think it is). This would mean only a short script to cover a wide range of devices using location and name as the reference.

Rhasspy is very well documented and they are starting to address issues such as noise interfering with the wake word and also how to know if a command has ended when silent (if there is background noise).

Anyway - worth a look for anybody curious :slight_smile:

Hi all,

Ok so my approach to the non Cloud Voice Assistant using Rhasspy:

  1. Edit the Rhasspy sentences file to include a set of rules for turning on/off various devices based on location and object.

These are the rules that are tagged {} so that the words recognised are sent by Rhasspy as slots.

[On_Off]
location = (House | Downstairs | Bedroom | Kitchen | (Living Room):(livingRoom) | (Front Room):(frontRoom)| Bathroom){location}
object = (Lamp | Lamps | Light | Lights | (Floor Lamp):(FloorLamp) | (Floor Lamps):(FloorLamps) | (Table Lamp):(Table_Lamp) | (Table Lamps):(Table_Lamps)){object}
action = (on:true | off:false){action}
verb = (TURN | SWITCH | PUT | SET){verb}
group = (ALL | (THE):(SINGLE)){group}
/[<verb>] <action> <group> <location> <object> [in]
/[<verb>] <group> <location> <object> <action> [in]

The last two rows are required to recognise the intent (notice the option of using a verb or not e.g. turn, switch, set etc., or the word “in”):

  1. Create a nymea MQTT client thing to connect either to an external MQTT broker (if Rhasspy is also set to do that) or directly to Rhasspy’s own MQTT broker. Subscribe to hermes/intent/# to receive Rhasspy intent data as json.

  2. Create a script with:

  • ThingEvent to trigger when Rhasspy sends the json intent, this is the MQTT client Thing
  • Thing Action to change the power on a light - the id of this needs to be a combination of location and object that you want to control e.g. frontRoomFloorLamp (notice that in the Rhasspy sentences I have provided a translation - Rhasspy hears Front Room but sends frontRoom). The nymea component id needs to start with a lowercase letter for some reason.
  • ThingAction to publish the response back to Rhasspy on the topic hermes/api/tts - to say what has been done.
  1. Parse the json data from Rhasspy and then store the slots in variables in this case there will be a value for object, action, verb and group.

  2. Create an action from a combination of the location and object, and the action e.g. frontRoomFloorLamp power:true and pass it to the Thing you want to control. I used frontRoomFloorLamp.execute({“power”:}), substitute with the action variable from slot data.

I haven’t used the verb or group yet - the group might need to be an interfaceEvent (wip).

  1. Publish a suitable comment back to Rhasspy.

OK - so this is quick and dirty at the moment and needs more work - but I thought I would post it up because I am sure there is a better way! If the thingAction could be created dynamically then there would be no need to add each thing that you want to control.

Works OK but needs more effort. Happy to explain in more detail because I have tried to keep this brief for now.

Hey, I haven’t set up the whole thing yet, but as a quick reply on 6: You can create actions dynamically with:

import QtQuick 2.0
import nymea 1.0

Item {
    id: root

    Component.onCompleted: {
        var action = actionComponent.createObject(root, {thingId: "5cd852dd-735d-491c-8630-470bbd78401c", actionName: "power"})
        action.execute({"power": true})
    }
    
    Component {
        id: actionComponent
        ThingAction {}
    }
}

or:

import QtQuick 2.0
import nymea 1.0

Item {
    id: root

    Component.onCompleted: {
        var action = Qt.createQmlObject(
            "import nymea 1.0; ThingAction {thingId: \"5cd852dd-735d-491c-8630-470bbd78401c\"; actionName: \"power\"}",
             root, "dynamic action")
        action.execute({"power": true})
    }
}

For more information see: Dynamic QML Object Creation from JavaScript | Qt QML 5.15.3

I guess it might even work to have a normal ThingAction and just reassign thingId dynamically before calling execute()

Oh, and you might want to call destroy() on dynamically created objects when you don’t need them any more… There is JS garbage collection but depending how you create stuff it might not trigger (e.g. if the parent is never destroyed)… You can attach something like this to the action to verify:

Component {
    id: actionComponent
    ThingAction {
        Component.onCreated: console.log("Created new ThingAction")
        Component.onDestruction: console.log("Destroying ThingAction")
    }
}

Hi @mzanetti

Thanks that is useful. I am not sure I understand it all - but will have a look at the documents.

I think your last point is what I was originally thinking - find a Things ID from the LocationObject value and then use it as the ID in one single ThingAction

e.g. find the Thing with the label/tag frontRoomFloorLamp - if exists then retrieve the id - such as - “thingId: “{74f8effa-2be7-46f5-8b43-f94cb6ce51e4}” // Front Room Floor Lamp”, store the string “74f8effa-2be7-46f5-8b43-f94cb6ce51e4” in a variable e.g. (intentObject) and then use it in a single ThingAction:

ThingAction {
id: locationObject
thingId: intentObject
actionName: “power”
}

Ok, so I tried this last night. It seems to work quite nicely indeed… Setup is a bit tedious but it’s a good start.

FWIW, you can configure Rhasspy to use nymea as MQTT broker, then you don’t need an external mosquitto or whatever… That worked fine for me. Use the “Internal MQTT client” thing then to interact with it. Will keep on playing around with this.

Yes there are several options for MQTT.

I was successful in getting Rhasspy to turn the interface group - lights, sockets etc on and off by just adding an interface action and including the location House in the command to Rhasspy. So for example - “turn off the House lights” - connects to the interface action.

So, if there were three Action Things with dynamic reference to the id and thing id - most commands can be performed using just a short script.

Examples include: “turn off the kitchen light” - passes to nymea thingAction single.
“turn on all house lights” - passes to an interfaceAction based on the group location
“Turn on all living room lamps” - passes as to a groupAction (if there was such a thing) based on logic using the group slot

This would easily extend to set temp to, or increase volume to with just a few additional entries in the Rhasspy sentence file.

I know this is all rough and ready but it does work surprisingly well and this approach is much more efficient than many other VA interfaces.

A bit of fun to do although I think it would be easy for someone with development experience to create a plugin.

What I take from this is how flexible nymea really is. This is a very important point because any smart home application that is easy to setup out of the box, as nymea is; must also cater for the 20% of cases where the user needs to do something creative - having a decent scripting engine makes all the difference. It is always a balance between really “geeky” solutions like Fhem that require in depth technical skills, simpler stuff like Domoticz, webthings etc., all encompassing solutions like Openhab, iobroker and wide ranging applications like Home Assistant.

My view is that nymea has some unique attributes: it is lightweight making it resource friendly, has some cool network stuff built into it making it an ideal motorhome solution, is pretty and simple to use (without having to spend time learning yaml or css), has enough power under the hood to customise (even more with facilities to customise things) and has the security advantage of being a non web app. Amazing and thanks.

Oh and did I mention that I think it is “grandad” friendly :slight_smile: making it a great choice for “senior” home solutions.

1 Like

Forgot to mention that if you are going to connect Rhasspy to nymea’s MQTT set the wake word to record on a UDP port and also set the aplay device to stream on the same UDP port. If you don’t do this then it will send microphone data as MQTT wav chunks and flood nymea. Using UDP means that it only sends voice over MQTT after hearing the wake word.

Is there an advantage in general to using the nymea MQTT instead of an external one? e.g. performance, recovery etc.

Right, I noticed the audio packets coming in on mqtt. Good to know that can be offloaded into a dedicated UDP connection. I hadn’t seen that yet.

In general I don’t think it makes any difference whether you’re using the nymea internal mqtt broker or an external one, except that it’s one service less to take care of.

I finally got around to writing up my project on hackster. If you are interested the article can be found at:

3 Likes