docs: tutorial: dataflow: chatbot: Gitter bot

aghinsa · web-flow · commit dabfb178a6b8 · 2020-07-24T15:44:50.000-07:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -66,6 +66,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   HTTP 307 response
 - Support for immediate response in HTTP service
 - Daal4py example usage.
+- Gitter chatbot tutorial.
 ### Changed
 - Renamed `-seed` to `-inputs` in `dataflow create` command
 - Renamed configloader/png to configloader/image and added support for loading JPEG and TIFF file formats
diff --git a/dffml/df/base.py b/dffml/df/base.py
@@ -460,7 +460,9 @@ async def run(
                         # We can't pass self to functions running in threads
                         # Its not thread safe!
                         bound = func.__get__(self, self.__class__)
-                        result = await bound(**inputs)
+                        result = bound(**inputs)
+                        if inspect.isawaitable(result):
+                            result = await result
                     elif inspect.iscoroutinefunction(func):
                         result = await func(**inputs)
                     else:
diff --git a/docs/tutorials/dataflows/chatbot.rst b/docs/tutorials/dataflows/chatbot.rst
@@ -0,0 +1,129 @@
+Gitter ML Inference Chatbot
+===========================
+
+This tutorial shows how to use configs in DFFML operations. We'll be implementing
+a Gitter chatbot. Let's take a look at the final result before moving forward.
+
+.. image:: ./data/gitter.gif
+
+Okay, Let's start!!
+We'll be using the Gitter's Streamping API to collect chats, for this we need an
+authorization token from Gitter. Go to https://developer.gitter.im/apps
+and get the personal access token for your chatbot (If you are redirected to the Gitter docs
+from this URL, sign in and try again).
+
+Our dataflow will take a Gitter room URI as input (For https://gitter.im/dffml/community
+``dffml/community`` is the URI), listens to chats in the room and replies to
+messages which are directed to our bot.
+
+.. note::
+
+    All the code for this example is located under the
+    `examples/dataflow/chatbot <https://github.com/intel/dffml/blob/master/examples/examples/dataflow/chatbot>`_
+    directory of the DFFML source code.
+
+We'll write the operations for this dataflow in operations.py
+
+.. literalinclude:: /../examples/dataflow/chatbot/operations.py
+    :lines: 24-51
+
+All requests to Gitter's API requires the room id for our room.
+``get_room_id`` gets the ``room id`` from room name (The input to
+our dataflow).
+
+.. literalinclude:: /../examples/dataflow/chatbot/operations.py
+    :lines: 52-87
+
+We listen to new messages directed to our bot.
+
+.. literalinclude:: /../examples/dataflow/chatbot/operations.py
+    :lines: 90-122
+
+We'll use this op to send replies back to the chatroom
+
+.. literalinclude:: /../examples/dataflow/chatbot/operations.py
+    :lines: 125-220
+
+This is the operation where all the logic for interpreting the messages
+go. If you have a Natural Language Understanding module It'd go here, so
+that you can parse unstructered data.
+
+Our operations are ``get_room_id, stream_chat, send_message and interpret_message``.
+All of them use at least one config. The common config being INISecretConfig which
+loads secret token and bot name from the ini config file.
+
+.. literalinclude:: /../examples/dataflow/chatbot/configs.ini
+
+Detour: What are imp_enter and ctx_enter?
+-----------------------------------------
+
+.. code-block:: python
+
+    config_cls=GitterChannelConfig,
+    imp_enter={"secret": lambda self: self.config.secret},
+    ctx_enter={"sctx": lambda self: self.parent.secret()},
+
+This piece of code in the op decorator tells that the operation will be using
+``GitterChannelConfig``. ``imp_enter`` and ``ctx_enter`` are basically shortcuts for
+the double context entry followed in dffml.
+
+``"secret": lambda self: self.config.secret``: sets the ``secret`` attribute of parent
+to what is returned by the function; in this case it returns BaseSecret.
+
+``"sctx": lambda self: self.parent.secret()``: calls the function and assigns the
+return value to ``sctx`` attribute.
+
+So in the operation instead of
+
+.. code-block:: python
+
+    with self.config.secret() as secret:
+        with sctx as secret():
+            sctx.call_a_method()
+
+we can do
+
+.. code-block:: python
+
+    self.sctx.call_a_method()
+
+Running the dataflow
+--------------------
+
+.. literalinclude:: /../examples/dataflow/chatbot/run.py
+
+set the room name, config file name and run the dataflow
+
+.. code-block:: console
+
+    python run.py
+
+Or using the command line to, create the dataflow
+
+.. code-block:: console
+
+    dffml dataflow create \
+        operations:get_room_id \
+        operations:stream_chat \
+        operations:send_message \
+        operations:interpret_message \
+        -config \
+            ini=operations:get_room_id.secret.plugin \
+            configs.ini=operations:get_room_id.secret.config.filename \
+            ini=operations:stream_chat.secret.plugin \
+            configs.ini=operations:stream_chat.secret.config.filename \
+            ini=operations:send_message.secret.plugin \
+            configs.ini=operations:send_message.secret.config.filename \
+            ini=operations:interpret_message.secret.plugin \
+            configs.ini=operations:interpret_message.secret.config.filename \
+    > chatbot_df.json
+
+And run it by providing the ``room_name`` as the input
+
+.. code-block:: console
+
+    dffml dataflow run records all \
+        -dataflow ./chatbot_df.json \
+        -inputs test_community1/community=room_name \
+        -sources m=memory \
+        -source-records temp
diff --git a/docs/tutorials/dataflows/index.rst b/docs/tutorials/dataflows/index.rst
@@ -9,4 +9,5 @@ Here we have some examples to better understand the DFFML DataFlows.
 
     locking
     io
+    chatbot
     nlp
diff --git a/examples/dataflow/chatbot/configs.ini b/examples/dataflow/chatbot/configs.ini
@@ -0,0 +1,6 @@
+[secrets]
+access_token = EnterAccessToken
+botname = UserNameOfBot
+api_url = https://api.gitter.im/v1
+stream_url = https://stream.gitter.im/v1
+
diff --git a/examples/dataflow/chatbot/operations.py b/examples/dataflow/chatbot/operations.py
@@ -0,0 +1,220 @@
+import io
+import re
+import sys
+import json
+import tempfile
+import contextlib
+from aiohttp import ClientSession, ClientTimeout
+
+from dffml.cli.cli import CLI
+from dffml import op, config, Definition, BaseSecret
+
+ACCESSTOKEN = Definition(name="access_tok3n", primitive="str")
+ROOMNAME = Definition(name="room_name", primitive="str")
+ROOMID = Definition(name="room_id", primitive="str")
+MESSAGE = Definition(name="message", primitive="str")
+TOSEND = Definition(name="to_send", primitive="str")
+
+
+@config
+class GitterChannelConfig:
+    secret: BaseSecret
+
+
+@op(
+    inputs={"room_uri": ROOMNAME},
+    outputs={"room_id": ROOMID},
+    config_cls=GitterChannelConfig,
+    imp_enter={
+        "secret": lambda self: self.config.secret,
+        "session": lambda self: ClientSession(trust_env=True),
+    },
+    ctx_enter={"sctx": lambda self: self.parent.secret()},
+)
+async def get_room_id(self, room_uri):
+    # Get unique roomid from room uri
+    access_token = await self.sctx.get("access_token")
+    headers = {
+        "Content-Type": "application/json",
+        "Accept": "application/json",
+        "Authorization": f"Bearer {access_token}",
+    }
+
+    api_url = await self.sctx.get("api_url")
+    url = f"{api_url}/rooms"
+    async with self.parent.session.post(
+        url, json={"uri": room_uri}, headers=headers
+    ) as resp:
+        response = await resp.json()
+        return {"room_id": response["id"]}
+
+
+@op(
+    inputs={"room_id": ROOMID},
+    outputs={"message": MESSAGE},
+    config_cls=GitterChannelConfig,
+    imp_enter={
+        "secret": lambda self: self.config.secret,
+        "session": lambda self: ClientSession(
+            trust_env=True, timeout=ClientTimeout(total=None)
+        ),
+    },
+    ctx_enter={"sctx": lambda self: self.parent.secret()},
+)
+async def stream_chat(self, room_id):
+    # Listen to messages in room
+    access_token = await self.sctx.get("access_token")
+    headers = {
+        "Accept": "application/json",
+        "Authorization": f"Bearer {access_token}",
+    }
+    stream_url = await self.sctx.get("stream_url")
+
+    url = f"{stream_url}/rooms/{room_id}/chatMessages"
+    botname = await self.sctx.get("botname")
+
+    async with self.parent.session.get(url, headers=headers) as resp:
+        async for data in resp.content:
+            # Gitter sends " \n" at some intervals
+            if data == " \n".encode():
+                continue
+            print(f"\n\n Got data {data} \n\n")
+            data = json.loads(data.strip())
+            message = data["text"]
+            # Only listen to messages directed to bot
+            if f"@{botname}" not in message:
+                continue
+            yield {"message": message}
+
+
+@op(
+    inputs={"message": TOSEND, "room_id": ROOMID},
+    config_cls=GitterChannelConfig,
+    imp_enter={
+        "secret": lambda self: self.config.secret,
+        "session": lambda self: ClientSession(trust_env=True),
+    },
+    ctx_enter={"sctx": lambda self: self.parent.secret()},
+)
+async def send_message(self, message, room_id):
+    access_token = await self.sctx.get("access_token")
+    headers = {
+        "Content-Type": "application/json",
+        "Accept": "application/json",
+        "Authorization": f"Bearer {access_token}",
+    }
+    try:
+        message = json.loads(message)
+        message = json.dumps(message, indent=4, sort_keys=True)
+    except:
+        pass
+
+    # For new line we need \\n,else Gitter api
+    # responds with 'Bad Request'
+    message = message.replace("\n", "\\n")
+    api_url = await self.sctx.get("api_url")
+    url = f"{api_url}/rooms/{room_id}/chatMessages"
+
+    async with self.parent.session.post(
+        url, headers=headers, json={"text": message}
+    ) as resp:
+        response = await resp.json()
+        return
+
+
+@op(
+    inputs={"message": MESSAGE,},
+    outputs={"message": TOSEND},
+    config_cls=GitterChannelConfig,
+    imp_enter={"secret": lambda self: self.config.secret},
+    ctx_enter={"sctx": lambda self: self.parent.secret()},
+)
+async def interpret_message(self, message):
+    greet = ["hey", "hello", "hi"]
+    for x in greet:
+        if x in message.lower():
+            return {"message": "Hey Hooman ฅ^•ﻌ•^ฅ"}
+
+    def extract_data(raw_data):
+        raw_data = raw_data.split("data:")
+        data = {"model-data": raw_data[1]}
+        raw_data = raw_data[0].split("\n")
+        for x in raw_data:
+            k, *v = x.split(":")
+            if isinstance(v, list):  # for features
+                v = ":".join(v)
+            k = k.strip()
+            v = v.strip()
+            if k:  # avoid blank
+                data[k] = v
+        return data
+
+    # Removing username from message
+    # The regex matches @ followed by anything that
+    # is not a whitespace in the first group and
+    # the rest of the string in the second group.
+    # We replace the string by the second group.
+    message = re.sub(r"(@[^\s]+)(.*)", r"\2", message).strip()
+
+    if message.lower().startswith("train model"):
+        return {"message": "Gimme more details!!"}
+
+    elif message.lower().startswith("predict:"):
+        # Only replace first occurence of predict
+        # because the feature to predict will be labeled predict
+        raw_data = message.replace("predict:", "", 1).strip()
+        cmds = ["predict", "all"]
+
+    elif message.lower().startswith("details:"):
+        raw_data = message.replace("details:", "",).strip()
+        cmds = ["train"]
+
+    else:
+        return {"message": " Oops ,I didnt get that ᕙ(⇀‸↼‶)ᕗ "}
+
+    # If predict or train, extract data
+    data = extract_data(raw_data)
+    if "model-type" in data:
+        model_type = data["model-type"]
+    if "model-name" in data:
+        model_name = data["model-name"]
+    else:
+        model_name = "mymodel"
+
+    features = data["features"].split(" ")
+    predict = data["predict"]
+    model_data = data["model-data"]
+
+    with tempfile.NamedTemporaryFile(suffix=".csv") as fileobj:
+        fileobj.write(model_data.lstrip().encode())
+        fileobj.seek(0)
+
+        stdout = io.StringIO()
+        with contextlib.redirect_stdout(stdout):
+            preds = await CLI.cli(
+                *cmds,
+                "-model",
+                model_type,
+                "-model-directory",
+                model_name,
+                "-model-features",
+                *features,
+                "-model-predict",
+                predict,
+                "-sources",
+                "f=csv",
+                "-source-filename",
+                fileobj.name,
+            )
+            sys.stdout.flush()
+
+    if "train" in cmds:
+        return {"message": "Done!!"}
+    else:
+        m = {}
+        for pred in preds:
+            pred = pred.predictions()
+            m.update({p: pred[p]["value"] for p in pred})
+        message = [f"{k}: {v}" for k, v in m.items()]
+        message = "\n".join(message)
+    return {"message": message}
diff --git a/examples/dataflow/chatbot/run.py b/examples/dataflow/chatbot/run.py
diff --git a/examples/dataflow/chatbot/test.py b/examples/dataflow/chatbot/test.py

-Original file line number
+Diff line change
     locking
     io
 +    chatbot
     nlp