Tutorial

Posts designed as a walk through tutorial of how to achieve desired behavior.

On demand gaming server for pennies a month!

The only downside with most dedicated gaming servers is the cost. If you need a 24/7 server or someone to manage it for you, you can’t avoid it. However, if you are self sufficient for setting up the gaming software, and only play certain games intermittently, your costs can be significantly reduced.

Let’s use Factorio as an example. To pay for a dedicated server it will cost around $5~10 per month, used or not. Now, if you are like me and play with friends and family only a few nights a month, that’s an unnecessary expense. Sure you can self host, but then everyone is dependent on a single person to have it up and running. Instead, I switched to using a Digital Ocean Droplet, and have paid less than a quarter this past month for an on demand gaming server.

Even if we played every night after work and weekends, it would only be $1 or $2 a month. You may be thinking “oh, you just turn off the droplet when you’re not using it!”, but alas, it is not that simple. A turned off droplet or server is still holding onto resources, so it still incurs cost, which is standard across all the hosting giants I looked into.

The trick on how to save cash? When you’re not playing, turn the droplet off, snapshot it, then destroy the droplet. When you want to play again, restore the snapshot to a new droplet. That way you are only paying server costs while it is running, then paying the super cheap snapshot storage the rest of the time.

This is painful to do by hand, but really easy with a helper script. (If you want to take it further, you could make it a website with access controls so only who you want could control it whenever they wanted.)

Digital Ocean Gaming Service (DOGS)

The code repo is available on github and the instructions are boringly standard.

git clone https://github.com/cdgriffith/dogs.git
# Alternatively, just download and extract the zip file
# https://github.com/cdgriffith/dogs/archive/master.zip

# Create a venv if you are python savvy
pip install requirements.txt
cp config.yaml.example config.yaml
# Update the config file to match your digital ocean settings
python -m dogs

This script does have some prerequisites. You need to have a Digital Ocean account token, and will have to manually create and startup the server once before you can manage it via this script. (Pull Requests always considered if you want to ease that pain for others.)

Droplet setup

When you do create the droplet you want, make sure to pick the smallest specs you think you will need. You can always upgrade to larger disk size, but cannot go back down to smaller. Also during this process, record the hostname which we will use as our server name in the config file.

You will also need your SSH key id, which are the digits at the very end of the public key. For example if the key ends with rsa-key-20190721 the ssh id is 20190721. You can always find this info in the Accounts > Security section as well by selecting a key and hitting “Edit”.

Also if you hook a firewall up to the droplet you will also need that ID, which can you retrieve via the API.

DOGS Server Config

When you got all that info, add it to the config.yaml file.

token: <your 64 character hex string>
 servers:
   ubuntu-1804-factorio:
     region: nyc1
     size: s-1vcpu-2gb
     firewall_id: <unique hex string separated by dashes>
     snapshot_max: 2
     ssh_key: <8 digits from end of ssh public cert>

You can get a full list of regions and sizes from their API as well.

Again, to run you just need to be in the dogs directory and run python -m dogs.

Binary (EXE) files

If you want to package it into an easy to use exe (can also modify for mac or linux binaries), just use the included build scripts.

# Windows specific requirements, otherwise just install 'pyinstaller'
pip install -r requirements-build.txt
python dogs\build.py

And now you should have a super handy dogs.exe in the dist directory. Don’t forget to keep your config.yaml in the same directory as it!

Gaming Setup Scripts

I have a directory to put files for setting up and updating game servers. Right now I only have factorio, but feel free to add your own, a PR would be very appreciated! All you need is a way to near automatically create / install the service and a way to have it auto start (in my example using systemd for standard Ubuntu servers)

Check the scripts out on github.

Replacing NZXT’s CAM software on Windows for Kraken

NZXT Kraken coolers are awesome for CPUs or GPUs. Their CAM software on the other hand is slow, bloated and possibly stealing your data. Thankfully, there are open source alternatives available.

The option that I will walk you through using is a command line tool that doesn’t need to be constantly running in the background, and doesn’t require any internet connection to work.

APRIL 2019 UPDATE: liquidctl now has the best support all around and automated windows builds! Use this and forget about everything below!

UPDATE: I have created a standalone executable for Windows for those that do not want to bother with the steps below. It is available on github. (For x61 and other second gen users, you will still need to update the driver as part of step five below.)

There are a few different libraries to control third generation coolers ( Kraken x62, x72, x52 and x42 ). The only one that supports fan control for second generation ( Kraken x61, x41 and x31 ) as well as third on Windows that I have found is liquidctl (as of 2/6/2019 still in an experimental dev branch, can check on the issue directly for updates ).

Please note: none of this is my own software, and this is only a guide based on my own experience. This is fully “as-is”, no warranty or guarantee it won’t harm your hardware or other software.

I am going to get a little more detailed for these steps than usual, as I want to make this easy for anyone’s skill level. For power users, here are the abbreviated steps:

  1. Download and install Python 3.7+ x86 version
  2. Create Python virtual env
    • run command python -m venv venv
    • Activate it venv/Scripts/activate.bat
  3. Downloaded libusb
    • extracted the files from MS32/dll into the venv/Scripts folder
    • You may need an extractor program like 7zip
  4. Downloaded liquidctl
    • Make sure you are in the activated venv
    • pip install liquidctl
  5. Kraken x61 / gen two users only – Downloaded zadig and install libusbK driver for the Kraken device
    • WARNING – CAM will no longer be able to use this device unless you uninstall this new driver
    • Select Options > List all Devices
    • Find the device “690LC” in the dropdown list, should have USB ID of 2433 B200
  6. Run liquidctl to change your fan speed!
    • liquidctl set fan speed 60

As of time of writing their software does not by default include the code for changing the logo color for the second generation Krakens (x61, etc..), but I’ll show you how to add it easily. Checkout the liquidctl repo, that now has great support for older devices, as well as some EVGA and Corsair support!

Download Python

For those of you who do not already have it, we will need to make sure there is a version of Python 3 on your system. Go to the official download page from the Python Software Foundation at https://www.python.org/downloads/windows/. Click on the top link for “Latest Python 3 Release” (version may be newer than shown below.)

Scroll down to the bottom of the next page, and select the “Windows x86 Executable Installer”. Yes, install the x86 version, aka 32-bit , even if you have 64-bit Windows. If you do happen to download and install the x86_64 version instead, you just will have to use a different libusb dll (thankfully included in same download bundle).

After the exe is downloaded, double click on it, and lets make a few changes during the installation to make your life easier. First make sure to add Python to PATH. This will make it possible to run python from the command line without specifying the full path to it’s executable. Then click on “Customize installation”.

Next page we don’t need to touch anything. If you are only going to use python for only this, you can remove Documentation and Python test suite and it will still work fine and reduce bloat.

Final page we just want to make sure it’s installed the program files directory instead of the obscure user app dir that it uses by default. Then hit install.

You should now be able to open python easily in the command prompt. You can open it by hitting the Windows key on your keyboard (or manually opening start menu), typing “Command” and then click on “Command Prompt”.

C:\Users\me>python --version
Python 3.7.2

Create a virtual environment

By creating a virtual environment you isolate the new libraries you are going to install from the system version of Python, eliminating possible future headaches if you need to install conflicting packages.

Thankfully it is super simple to do, open a command prompt, navigate to the directory you want a new folder with the virtual environment in, and type:

python -m venv venv

This is simply saying “Run python, using module venv -m venv to create a virtual env at the venv directory. Then you want to “activate” it so that your command prompt will instead use that new isolated version of Python and Python tools (like pip).

venv\Scripts\activate.bat

This will cause your command prompt to give a nice little notification that you are now using that venv:

C:\Users\me> venv\Scripts\activate.bat

(venv) C:\Users\me>

Remember the path you installed the virtual env too, we will use it later!

Install libusb DLLs

Just because life has to be a little extra difficult, libusb only pushes their releases in 7z and tar.bz2 archives. Which means, if you don’t already have a tool that can open these type of files, head over to https://www.7-zip.org/download.html and download either the exe or msi for your version of Windows (if you are unsure, just get the 32-bit exe one).

Installing it should be fine with the defaults, feel free to change as you want. When done, go over and download libusb. The developers of
liquidctl suggest you use 1.0.21 at time of writing so we are linking directly to that version https://github.com/libusb/libusb/releases/tag/v1.0.21.

After download, extract those files with 7zip.

Then you will want to go into the new directory, and find the proper DLLs for your installed version of Python. If you followed this guide so far, that will be the MS32 folder.

You are going to copy those into your virtual environment directory Scripts directory, as that is by default added to the PATH when you activate the virtual env. So if I was in the C:\Users\me directory when I ran python -m venv venv I would copy those three files into C:\Users\me\venv\Scripts.

They look a little out of place, but meh, it works.

Change the Kraken drivers (x61, x41, x31 Only)

Sadly (or not so sadly) we can’t use the default second gen Kraken drivers installed by CAM. Which also mean that when we switch these, CAM will no longer be able to interface with the Kraken unless this new driver is removed via the Driver Manager.

ONLY FOR KRAKEN x61, x41 and x31! Skip ahead to the next section for anything else.

Go to https://zadig.akeo.ie/ and download their awesome driver switcher tool.

Run it as an administrator, then go to Options and select “List all Devices”.

Chose 690LC from the drop down list. It’s USB ID should be 2433 B200. Select either the libusb-win32 or libusbK driver from the right hand tiny select buttons. (I had trouble the first time timing out on the libusbdrivers so I switched to libusbK, your mileage may vary.) Then hit ‘Replace Driver’.

This will probably take a while. Hopefully it pops up a success window in a few seconds or minutes. If not, I suggest waiting a full ten minutes, then switching to the other driver type (aka select libusbK if you first tried libusb-win32) and trying again before running to their github for help.

Install liquidctl

We’re almost there!

Switch back to the command prompt and make sure you are in the active virtual env. If you don’t see (venv) at the start of the line, go back and take a look at the “Create a virtual environment” section.

If you are using this for a newer third generation device, like the Kraken x62, you can use the regular module.

pip install liquidctl

However, if you do need second generation support for the Kraken x61, x41 or x31, you will need to grab the experimental branch. Please be aware that this may be at an unstable state at time of your download. You can follow current issues directly in the pull request.

pip install git+https://github.com/jonasmalacofilho/liquidctl.git@add-second-gen-krakens

This will install the command in the active virtual environment as liquidctl! Keep in mind, you will need to activate this virtual environment to use this command.

Using liquidctl

You can find the full list of what you can do via --help or from their github. But to start you want to list out devices detected on the system, initialize them, and view their status.

(venv) C:\>liquidctl list
Device 0, Asetek 690LC (NZXT, EVGA or other) (experimental)

(venv) C:\>liquidctl initialize
report: failed to (pre) open (control: open), ignoring

(venv) C:\>liquidctl status
Device 0, Asetek 690LC (NZXT, EVGA or other) (experimental)
report: failed to (pre) open (control: open), ignoring
Liquid temperature          31.2  °C
Fan speed                   1320  rpm
Pump speed                  3600  rpm
Firmware version         0.7.0.0

For the Kraken x61 and gen-two devices, the only thing you can do* is set the fan speed. (* As of time of writing, could have come a long way by whenever ‘now’ is. Also, you can used my “hack” below to change the logo color.)

(venv) C:\> liquidctl set fan speed 100

If you are using third gen devices, you can have fun with pump speeds or colors as well!

liquidctl set ring color fading 350017 ff2608
liquidctl set logo color spectrum-wave --speed slowest
liquidctl set pump speed 90

Multiple Devices

If you happen to have multiple supported devices on your system, you will have to select which one you want to use. First figure out which device is numbered as which.

(venv) C:\> liquidctl list
Device 0, Asetek 690LC (NZXT, EVGA or other) (experimental)
Device 1, NZXT Kraken X (X42, X52, X62 or X72

Then just append --device <num> to any follow up commands.

liquidctl --device 1 set ring color spectrum-wave
liquidctl --device 0 set fan speed 100

You can see more detailed options including color modes for Kraken devices here.

Getting color working for the Kraken x61

If you don’t code and are never going to use this virtual environment for anything other than this tool, don’t fret. It’s a copy paste operation! (For others that don’t want to modify their site-packages file and know what that means, you can do a local develop mode install via setup.py and play to your heart’s content.)

Okay, so all we need to do is simply add a new section of code to one of the files. You’ll have to navigate a few layers deep into the virtual environment you created: venv\Lib\site-packages\liquidctl\driver there is a file called asetek.py that you need to open. If you installed the full Python suite, you should be able to easily right click and edit in IDLE.

Scroll to the bottom of the file, and paste the following block of code:

    def set_color(self, channel, mode, colors, speed):
        """Set the color of the logo."""
        modes = ('fixed', 'alternating', 'blinking', 'off')
        speeds = {
            'fastest': 1,
            'faster': 2,
            'normal': 3,
            'slower': 4,
            'slowest': 5
        }
        try:
            speed = int(speed)
        except ValueError:
            if speed not in speeds:
                LOGGER.warning('Speed must be a value between 1 and 255, setting to 1')
                speed = 1
            else:
                speed = speeds[speed]
        else:
            if speed < 1 or speed > 255:
                speed = 1
                LOGGER.warning('Speed must be a value between 1 and 255, setting to 1')

        if channel != 'logo':
            LOGGER.warning('Only "logo" channel supported for this device, falling back to that')

        if mode not in modes:
            raise NotImplementedError('Modes available are: {}'.format(",".join(modes)))

        if mode == 'off':
            color1, color2 = [0x00, 0x00, 0x00], [0x00, 0x00, 0x00]
        else:
            color1, *color2 = colors
            if len(color2) > 1:
                LOGGER.warning('Only maximum of 2 colors supported, ignoring further colors')
            color2 = color2[0] if color2 else [0x00, 0x00, 0x00]

        self._begin_transaction()
        data = ([0x10] +
                color1 +
                color2 +
                [0xff, 0x00, 0x00, 0x37,
                 speed,
                 speed,
                 0x00 if mode == 'off' else 0x01,
                 0x01 if mode == 'alternating' else 0x00,
                 0x01 if mode == 'blinking' else 0x00,
                 0x01, 0x00, 0x01])
        data.to_yaml(filename=self.data_file)
        self._write(data)
        self._end_transaction_and_read()

Because Python is spacing specific, make sure the def set_color lines up with the def _write above it. It should look something like this (code may change vs what is in the image):

Then, make sure to save the file, by either hitting Ctrl+s or go to File > Save. The options for colors are quite limited for second gen devices versus third gen, but at least you can use them!

ModeColors to supply
off0
fixed1
alternating2
blinking1
liquidctl --device 0 set logo color blinking ffffff --speed 2

Now you can issue a command to set the colors like normal. Provide --speed to set number of seconds between alternating colors or blinking.

If you have any issues with the colors command, please let me know. If you have issues with the program itself, please open an issue directly on liquidctl github page.

Windows Auto-start

Alright, I’ve played around and have everything set perfectly. But wait, I rebooted and it’s all gone! NOOOO!

Thankfully, it’s rather simple to have your configuration auto load on startup. We simply have to put the commands into a script, and put it into the windows auto start directory.

Open your run menu by hitting the Windows Key plus r at the same time (or typing in run to windows search). When the little box pops up, type in shell:startup and hit OK. It will take you to a deep dark folder like C:\Users\me\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup

Open your favorite text editor (or if you know what you’re doing, just right click in the folder, create a new text file, then rename it with a .bat extension.) Keep that window open, so we can copy it’s path.

Now to create the small script. This script’s first command is to go into the venv\Scripts directory, so make sure to change to the full path of where that is in your version. Then, simply modify the liquidctl commands to the ones you want to issue.

cd C:\YOUR_OWN_FULL_PATH_HERE_MAKE_SURE_TO_CHANGE\venv\Scripts
call activate.bat

liquidctl --device 0 set logo color fixed ffffff
liquidctl --device 0 set fan speed 30 25  40 50  50 100

liquidctl --device 1 set logo color fixed ffffff
liquidctl --device 1 set ring color fixed ffffff
liquidctl --device 1 set pump speed 30 50  40 75  50 100
liquidctl --device 1 set fan speed 30 25  50 50  60 75  70 100

Now when you go to save this new file, copy that long path from the open shell:startup directory into the save bar so it is saved into that directory. Then save the file as a name you will remember, like kraken_control.bat just make sure it ends in .bat. In notepad, you may have to change the dropdown from Save as type to All files.

You should now see that file in the startup directory. Go ahead and double click on it to make sure it works. Easiest way to test is first manually change your Kraken colors to something different and make sure running the script changes them back.

If it doesn’t work, most likely a command is miss typed. Copy line for line into a cmd prompt and make sure it works for you there. (Or just run the .bat file from an open cmd prompt if you know how so you don’t have to copy-paste.)

That’s all for this tutorial. I hope you found it useful and stress free! If there is something that needs improved please let me know in the comments below.

Truffle: going from ganache to testnet (ropsten)

Truffle is an amazing suite of tools created by Consensys to develop smart contracts for the Ethereum blockchain network. However, it can be a bit jarring to make the leap from local development to the real test network, ropsten.

Required Setup

For this walk through, I have installed:

I will be using the default example truffle project, MetaCoin, that you can walk through how to unbox here or follow along using your own project.

First things first, if you do NOT have a package.json file yet, make sure to run npm init. This will turn the directory into a node package that we can easily manage with the npm package manager. So we don’t have to download dependices into the global package scope.

Now we can download all the things we are going to need:

npm install bip39 dotenv --save
  • bip39 – used to generate wallet mnemonic
  • dotenv – simple way to read environment variable files

We got everything development wise we need now.

Storing Secrets outside the code

We will have to create a private key or mnemonic, and that means we need somewhere relatively secure to store it. For testnet stuff, this can be as simple as making sure it’s not being put into version control alongside the code. To that end, we are going to use Environment Variables, and will to store them in a file called .env (that’s it, just an extension basically. Make sure to add it to your .gitignore if you’re using git). To learn more, check out the github page for dotenv. But for our purposes, all you need to know is that this file will have a format of:

ENV_VARIABLE_NAME=someting
ANOTHER_ENV=something else

Accessing testnet

The easiest way to reach out to testnet is by using a provider. I personally like using infura.io (free, just requires registration).  After you register and have your API key emailed to you, make sure you select the URL for the test network and add to the .env file using a variable named ROPSTEN_URL.

ROPSTEN_URL=https://ropsten.infura.io/<your-api-key>

It’s also possible to use your own geth node set to testnet, but that is not required.

Next we are going to create our own wallet, if you already have one set up, like with MetaMask, you can skip this next part.

Creating your testnet wallet

So now you have an place to put your secrets, lets create some. This is where bip39 comes in, it will create random mnemonics which can be used as the basis for private key of a wallet. It will be a series of 12 random words.

We could put this generation in a file, but it’s easy enough to just do straight from the command line:

node -e "console.log(require('bip39').generateMnemonic())"

This will output 12 words, DO NOT SHARE THESE ANYWHERE. The ones I am using below are example ones, and also shout NOT be used. Put them in .env file as the variable MNEMONIC. So now your .env file should now contain:

MNEMONIC=candy maple cake sugar pudding cream honey rich smooth crumble sweet treat
ROPSTEN_URL=https://ropsten.infura.io/<your-api-key>

We have our seed, so it’s time to hook it into our code. In your truffle.js or truffle-config.js file, you will need to now import the environment variables and a wallet provider at the top of the file.

require('dotenv').config()
const HDWalletProvider = require('truffle-hdwallet-provider')

After that is added, we will move down to the the exports section, we are going to add a new network, named ropsten. Then are going to use the HDWalletProvider and supply it with the mnemonic and Ifura url provided via environment variables.

module.exports = {
  networks: {
    ropsten: {
      provider: () => new HDWalletProvider(
        process.env.MNEMONIC,
        process.env.ROPSTEN_URL),
      network_id: 3
    },
  },
}

Test and make sure everything’s working by opening a truffle console, specifying our new network.

truffle console --network ropsten

We can then get our public account address via the console.

truffle(ropsten)> web3.eth.getAccounts((err, accounts) => console.log(accounts))
[ '0x627306090abab3a6e1400e9345bc60c78a8bef57' ]

If you are seeing this same wallet address, you did it wrong. Go back and make your own mnemonic, don’t copy the candy one from above.

Funding the wallet

In your development environment, the wallet already has ETH in it to pay for gas and deploying the contract. On the mainnet, you will have to buy some real ETH. On testnet, you can get some for free by using a Faucet, such as https://faucet.ropsten.be/ or if you’re using MetaMask just use https://faucet.metamask.io/.

Make sure to use the address you gathered from the console for the faucet,  and soon you should have test funds to play around with and actually deploy your contract.

Deploying the Contract

Now where the rubber meets the road, getting your contract out into the real (test) world.

truffle deploy --network ropsten

If everything is successful, you’ll get messages like these:

Using network 'ropsten'.

Running migration: 1_initial_migration.js
  Deploying Migrations...
  ... 0xefe70115c578c92bfa97154f70f9c3fbaa2b8400b1da1ee7cdxxxxxxxxxxxxxx
  Migrations: 0x6eeedefb64bd6ee6618ac54623xxxxxxxxxxxxxx
Saving successful migration to network...
  ... 0xd4294e35c166e2dca771ba9bf5eb3801bc1793e30db6a53d4dxxxxxxxxxxxxxx
Saving artifacts...
Running migration: 2_deploy_contracts.js
  Deploying Capture...
  ... 0x446d5e92d6976bb05c85bb95b243d6f7405af6bb12b3b6fe08xxxxxxxxxxxxxx
  Capture: 0x1d2f60c6ef979ca86f53af1942xxxxxxxxxxxxxx
Saving successful migration to network...
  ... 0x0b6f918ccc8e3b82cdf43038a2c32fe1fef66d0fa9aeb2260bxxxxxxxxxxxxxx
Saving artifacts...

Tada! You now have your custom contracts deployed to testnet!

Or, you got an out of gas error, as it is not uncommon to have to adjust the gas price to get it onto the network, as truffle does not automatically figure that out for you. A follow up post will show how to calculate and adjust gas price as needed.

 

 

 

Discover AWS State Machines using Python Lambdas for an ETL process

Step Functions, State Machines, and Lambdas oh my! AWS has really been expanding what you can do without needing to actually stand up any servers. I’m going to walk through a very basic example of how to get going with your own Python code to create an ETL (Extract Transform Load) process using Amazon’s services. And don’t worry, all this goodness is included in the free tier!

The goal of this exercise will be to have an aggregation of news headlines downloaded and transformed into CSV format and uploaded to another service. We are going to achieve this by breaking up each step of the process into its own AWS Lambda.

What are Lambdas?

AWS Lambdas are a “serverless”, stateless way to run snippets of code with no extra initialization or shutdown time.

When to use Lambdas

They are great if you have small highly reusable pieces of code that serve a single purpose. (If you have a few that go together really well, that’s where state machines come in.)  For example if you have some code that does image recognition and you need to use it across multiple projects. Or even just want it to run faster or be more accessible, as Lambdas have several ways they can be initiated, including via an API you can define.

They will NOT fit your purpose if you need something that does a multitude of tasks, will run for a long time, use a lot of memory or update frequently.

Creating a Lambda

Creating your own is a lot easier than a lot of other tutorials seem to show. If you haven’t already, sign up for an AWS account. Then open your AWS console and search for Lambda.

You’ll be presented with a welcome screen most likely, after clicking through “Get Started” or whatever they updated it to this month, you’ll have a screen where you can create new functions as well as check on existing ones.

See the big orange button that even Trump would be proud of? Click it.

As this is probably your first Lambda, you will have to create a new role. Super simple, don’t have to leave the page even. Just give it a new name, and give it a policy template. I used the Simple Microserve permissions as it seemed to fit the bill for me most.

Then you will be greeted with a page with a large amount of info and stuff going on. The part that we are going to be most concerned about is the Function Code area (and will also need Environment Variables to store API keys in).

It may seem like we need to set up triggers or resources for this information to go to, but as we plan to use these inside a state machine, that will handle all that bother for us.

ETL – Extract Transform Load

Now that we know how to make a Lambda, lets look at some code we could use with it. For the state machine we will create later, I want to have an entire process where I pull in information from an outside source (extract), modify to fit my needs (transform) and then put it into my own system (load.)

Extract

As stated above, this scenario involves pulling down data from a news source, in this case we are using News API that allows you to create a free API key to grab top news headlines and link to their stories.

That code is dead simple:

import json
from urllib import request


def retrieve_news(key, source):
    url = f"https://newsapi.org/v2/top-headlines?sources={source}&apiKey={key}"
    with request.urlopen(url) as req:
        return json.loads(req.read())

print(retrieve_news(my_key, 'associated-press')

If I wasn’t using this in a Lambda, I would be using the wonderful Requests module instead, but Python 3’s urllib is at least a lot better than 2s.

So now, we need a way for the Lambda function to call this code and pass along the results in a manor we can use later. On the page to fill in the code, you’ll see a place that says under Function Code that lists the Handler this is the entry point to your code. lambda_function.lambda_handler is the default, which means it will use the function lambda_handler inside the file lambda_function.py as the entry.

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import os
import json
from urllib import request


def retrieve_news(key, source):
    url = f"https://newsapi.org/v2/top-headlines?sources={source}&apiKey={key}"
    with request.urlopen(url) as req:
        return json.load(req)


# What AWS Lambda calls
def lambda_handler(event, context):
    key = os.getenv('NEWSAPI_KEY')
    if not key:
        raise Exception('Not all environment variables set')

    if not event['source']:
        raise Exception('Source not set')

    return {'data': retrieve_news(key, event['source']),
            'source': event['source']}

There are two arguments passed into the function, the first is event which is all the information sent to the lambda function (if using a standard JSON object this will be a dictionary, as seen above). The second is context which is a class that will tell you about the current lambda function if necessary, you can learn more about it here, but it will not be used in this example.

Testing the lambda

You may also notice that we are pulling the API key not from the event, but from an environment variable, so make sure to set that as well on the page. Last and not least, I would suggest increasing the timeout for the lambda to 10 seconds, from the default 3.

Before we go on and add the other functions, lets make sure this one works properly.  At the top of the page, where there is a drop down beside test and Actions on the right, click Configure test events we are going to add a new one with the details that will be passed into the event dictionary.

{
  "source": "associated-press"
}

On the pop-up, copy in the above JSON and save it as a new test event.

Hit the test button at the top, and see the results. You should get a big green window that shows you how it ran. If you have a red error window, you will have to figure out what went wrong first.

Transform

This will be our second lambda, so we get to go through the process again of creating a new one (you can use the exiting role from the last one) and copying this code into it. No Environment variables needed this time!

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import csv
from io import StringIO


# What AWS Lambda calls
def lambda_handler(event, context):

    sio = StringIO()
    writer = csv.writer(sio)
    writer.writerow(["Source", "Title", "Author", "URL"])
    for article in event['data']['articles']:
        writer.writerow([
            article['source']['name'],
            article['title'],
            article['author'],
            article['url']
        ])

    csv_content = sio.getvalue()
    print(csv_content)

    return {'data': csv_content,
            'source': event['source']}


The tricky part here is now you need good test data for it. Luckily you can copy the output of the last Lambda (provided snippet below) to do just that.

{
  "data": {
    "status": "ok",
    "totalResults": 5,
    "articles": [
      {
        "source": {
          "id": "associated-press",
          "name": "Associated Press"
        },
        "author": "FRANCES D'EMILIO",
        "title": "Pope accepts resignation of McCarrick after sex abuse claims",
        "description": "VATICAN CITY (AP) — In a move described as unprecedented, Pope Francis has effectively stripped U.S. prelate Theodore McCarrick of his cardinal's title and rank following allegations of sexual abuse, including one involving an 11-year-old boy. The Vatican ann…",
        "url": "https://apnews.com/46e8e15911034e7f971c7542b60a6444",
        "urlToImage": "https://storage.googleapis.com/afs-prod/media/media:b5c82ad2f2b74b50ab9faccf51898309/2628.jpeg",
        "publishedAt": "2018-07-28T16:21:57Z"
      },
      {
        "source": {
          "id": "associated-press",
          "name": "Associated Press"
        },
        "author": "KEVIN FREKING",
        "title": "On trade policy, Trump is turning GOP orthodoxy on its head",
        "description": "WASHINGTON (AP) — President Donald Trump's trade policies are turning long-established Republican orthodoxy on its head, marked by tariff fights and now $12 billion in farm aid that represents the type of government intervention GOP voters railed against a de…",
        "url": "https://apnews.com/57cd042b57054e5790b9b444c561ac3b",
        "urlToImage": "https://storage.googleapis.com/afs-prod/media/media:90f04d837f514d0b984e25bd5153be8a/3000.jpeg",
        "publishedAt": "2018-07-28T16:20:11Z"
      },
      {
        "source": {
          "id": "associated-press",
          "name": "Associated Press"
        },
        "author": "SETH BORENSTEIN and FRANK JORDANS",
        "title": "Science Says: Record heat, fires worsened by climate change",
        "description": "Heat waves are setting all-time temperature records across the globe, again. Europe suffered its deadliest wildfire in more than a century, and one of nearly 90 large fires in the U.S. West burned dozens of homes and forced the evacuation of at least 37,000 p…",
        "url": "https://apnews.com/a4255779e2b6461b9cc8dbf24ea4b96c",
        "urlToImage": "https://storage.googleapis.com/afs-prod/media/media:f9b76dc0354e47caafcfad96c36443ca/3000.jpeg",
        "publishedAt": "2018-07-28T15:03:01Z"
      },
      {
        "source": {
          "id": "associated-press",
          "name": "Associated Press"
        },
        "author": "MICHAEL KUNZELMAN and LARRY NEUMEISTER",
        "title": "No mystery to Supreme Court nominee Kavanaugh's gun views",
        "description": "SILVER SPRING, Md. (AP) — Supreme Court nominee Brett Kavanaugh says he recognizes that gun, drug and gang violence \"has plagued all of us.\" Still, he believes the Constitution limits how far government can go to restrict gun use to prevent crime. As a federa…",
        "url": "https://apnews.com/c8fc0785b429497abf9621efcdb345e8",
        "urlToImage": "https://storage.googleapis.com/afs-prod/media/media:4c3619ea948b4c91b8f2fcdd50162d26/3000.jpeg",
        "publishedAt": "2018-07-28T14:11:06Z"
      },
      {
        "source": {
          "id": "associated-press",
          "name": "Associated Press"
        },
        "author": "HOPE YEN, JOSH BOAK and CHRISTOPHER RUGABER",
        "title": "AP FACT CHECK: Trump's hyped claims on economy, NKorea, vets",
        "description": "WASHINGTON (AP) — President Donald Trump received positive economic news this past week and twisted it out of proportion. That impulse ran through days of rhetoric as he hailed the success of a veterans program that hasn't started and saw progress with North …",
        "url": "https://apnews.com/5b405824a9d843a09a641754d84aa1ab",
        "urlToImage": "https://storage.googleapis.com/afs-prod/media/media:636c2c3068b94181ba3c5bcb8d2a3ae9/3000.jpeg",
        "publishedAt": "2018-07-28T12:30:33Z"
      }
    ]
  },
  "source": "associated-press"
}

Configure and run the test like before using the above data.

In this case I also printed the output so you could see that any standard output is captured by the logs.

Load

Now to actually submit this data to a server, you could set up your own, or use file.io which is a free filedropper website, as the code uses below. No API needed!

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

from urllib import request, parse
import json


# What AWS Lambda calls
def lambda_handler(event, context):
    url = 'https://file.io'

    encoded_args = parse.urlencode({'text': event['data']}).encode('utf-8')

    with request.urlopen(url, encoded_args) as req:
        info = json.load(req)

    return {'data': info, 'source': event['source']}

Again, as this is reaching out to an external API, I would increase the default 3 second timeout limit of the Lambda from 3 to 10 seconds.

Woo! We now have three lambda’s that can take each other’s outputs in a row and do a full ETL process. Now lets put them together.

State Machines

AWS Step functions allow for creating a set of various actions to run with each other, and then presented in a pretty auto-generated graph. Back at the console, find the Step functions.

Then create a new state machine.

This is probably the hardest part, is the actual state machine definition. The state language can be confusing, thankfully for our needs we don’t need to do anything complicated.

You can use this code, and will just have to update the actual Resource links under Extract, Transform and Load. (You can even click on them and should be presented with a drop down of your previously created resources so you don’t have to copy the ARNs manually.)

{
  "StartAt": "Set Source",
  "States": {
    "Set Source": {
      "Type": "Pass",
      "Result": {"source": "associated-press"},
      "ResultPath": "$",
      "Next": "Extract"
    },
    "Extract": {
      "Type": "Task",
      "Resource": "<ARN>:function:google-news-extract",
      "ResultPath": "$",
      "Next": "Transform"
    },
    "Transform": {
      "Type": "Task",
      "Resource": "<ARN>:function:google-news-transform",
      "ResultPath": "$",
      "Next": "Load"
    },
     "Load": {
      "Type": "Task",
      "Resource": "<ARN>:function:google-news-load",
      "ResultPath": "$",
      "End": true
    }
  }
}

Notice the first step is not a task, but rather a pass through state that sets the source. We could do this during initialization, but wanted to highlight the ability to add information where needed.

After creation, we will need to start a new execution. It doesn’t need any input, but doesn’t hurt to include a comment if you want.

Then run it!

 

During the middle of an execution, it will show what has been run successfully and what is currently in progress, or erred. At any time, you can click on a specific block to see what it’s input and outputs were.

This function then can be run whenever to run the full ETL process!

Scheduling

For a process like this, you want to run it on a schedule. That means creating a new CloudWatch rule. Search for CloudWatch in the console, then click on Rules on the left hand side.

Then, click the big blue button.

It’s pretty simple to create a fixed rate schedule, and then just make sure to select the right state machine on the right side!

 

Uploading large files by chunking – featuring Python Flask and Dropzone.js

It can be a real pain to upload huge files. Many services limit their upload sizes to a few megabytes, and you don’t want a single connection open forever either. The super simple way to get around that is simply send the file in lots of small parts, aka chunking.

Chunking Food - Artwork by Clara Griffith

Chunking Food – Artwork by Clara Griffith

Finished code example can be downloaded here.

So there are going to be two parts to making this work, the front-end (website) and backend (server). Lets start on what the user will see.

Webpage with Dropzone.js

Beautiful, ain’t it? The best part is, the code powering it is just as succinct.

<!doctype html>
<html lang="en">
<head>

    <meta charset="UTF-8">

    <link rel="stylesheet" 
     href="https://cdnjs.cloudflare.com/ajax/libs/dropzone/5.4.0/min/dropzone.min.css"/>

    <link rel="stylesheet" 
     href="https://cdnjs.cloudflare.com/ajax/libs/dropzone/5.4.0/min/basic.min.css"/>

    <script type="application/javascript" 
     src="https://cdnjs.cloudflare.com/ajax/libs/dropzone/5.4.0/min/dropzone.min.js">
    </script>

    <title>File Dropper</title>
</head>
<body>

<form method="POST" action='/upload' class="dropzone dz-clickable" 
      id="dropper" enctype="multipart/form-data">
</form>


</body>
</html>

This is using the dropzone.js library, which has no additional dependencies and decent CSS included. All you have to do is add the class “dropzone” to a form and it automatically turns it into one of their special drag and drop fields (you can also click and select).

However, by default, dropzone does not chunk files. Luckily, it is really easy to enable. We are going to add some custom JavaScript and insert it between the form and the end of the body

</form>

<script type="application/javascript">
    Dropzone.options.dropper = {
        paramName: 'file',
        chunking: true,
        forceChunking: true,
        url: '/upload',
        maxFilesize: 1025, // megabytes
        chunkSize: 1000000 // bytes
    }
</script>

</body>

When enabling chunking, it will break up any files larger than the chunkSize and send them to the server over multiple requests. It accomplishes this by adding form data that has information about the chunk (uuid, current chunk, total chunks, chunk size, total size). By default, anything under that size will not have that information send as part of the form data and the server would have to have an additional logic path. Thankfully, there is the forceChunking option which will always send that information, even if it’s a smaller file. Everything else is pretty self-explanatory, but if you want more details about the possible options, just check out their list of configuration options.

Python Flask Server

Onto the backend. I am going to be using Flask, which is currently the most popular Python web framework (by github stars), other good options include Bottle and CherryPy. If you hate yourself or your colleagues, you could also use Django or Pyramid. There are a ton of good example Flask projects, and boiler plates to start from, I am going to use one that I have created for my own use that fits my needs, but don’t feel obligated to use it.

This type of upload will work across any real website back-end. You will simply need two routes, one that displays the frontend, and the other that accepts the file as an upload. At first, lets just view what dropzone is sending us. In this example my project’s name is called ‘pydrop’, and if you’re using my FlaskBootstrap code, this is the views/templated.py file.

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import logging
import os

from flask import render_template, Blueprint, request, make_response
from werkzeug.utils import secure_filename

from pydrop.config import config

blueprint = Blueprint('templated', __name__, template_folder='templates')

log = logging.getLogger('pydrop')


@blueprint.route('/')
@blueprint.route('/index')
def index():
    # Route to serve the upload form
    return render_template('index.html',
                           page_name='Main',
                           project_name="pydrop")


@blueprint.route('/upload', methods=['POST'])
def upload():
    # Route to deal with the uploaded chunks
    log.info(request.form)
    log.info(request.files)
    return make_response(('ok', 200))

Run the flask server and upload a small file (under the size of the chunk limit). It should log a single instance of a POST to /upload:

[INFO] werkzeug: 127.0.0.1 "POST /upload HTTP/1.1" 200 -

[INFO] pydrop: ImmutableMultiDict([
     ('dzuuid', '807f99b7-7f58-4d9b-ac05-2a20f5e53782'), 
     ('dzchunkindex', '0'), 
     ('dztotalfilesize', '1742'), 
     ('dzchunksize', '1000000'), 
     ('dztotalchunkcount', '1'), 
     ('dzchunkbyteoffset', '0')])

[INFO] pydrop: ImmutableMultiDict([
     ('file', <FileStorage: 'README.md' ('application/octet-stream')>)])

Lets break down what information we are getting:

dzuuid – Unique identifier of the file being uploaded

dzchunkindex – Which block number we are currently on

dztotalfilesize – The entire file’s size

dzchunksize – The max chunk size set on the frontend (note this may be larger than the actual chuck’s size)

dztotalchunkcount – The number of chunks to expect

dzchunkbyteoffset – The file offset we need to keep appending to the file being  uploaded

Next, let’s upload something just a bit larger that will require it to be chunked into multiple parts:

[INFO] werkzeug: 127.0.0.1 "POST /upload HTTP/1.1" 200 -

[INFO] pydrop: ImmutableMultiDict([
    ('dzuuid', 'b4b2409a-99f0-4300-8602-8becbef24c91'), 
    ('dzchunkindex', '0'), 
    ('dztotalfilesize', '1191708'), 
    ('dzchunksize', '1000000'), 
    ('dztotalchunkcount', '2'), 
    ('dzchunkbyteoffset', '0')])

[INFO] pydrop: ImmutableMultiDict([
    ('file', <FileStorage: '04vfpknzx8z01.png' ('application/octet-stream')>)])



[INFO] werkzeug: 127.0.0.1 "POST /upload HTTP/1.1" 200 -

[INFO] pydrop: ImmutableMultiDict([
    ('dzuuid', 'b4b2409a-99f0-4300-8602-8becbef24c91'), 
    ('dzchunkindex', '1'),
    ('dztotalfilesize', '1191708'),  
    ('dzchunksize', '1000000'), 
    ('dztotalchunkcount', '2'), 
    ('dzchunkbyteoffset', '1000000')])

[INFO] pydrop: ImmutableMultiDict([
    ('file', <FileStorage: '04vfpknzx8z01.png' ('application/octet-stream')>)])

Notice how /upload has been called twice. And that the dzchunkindex and dzchunkbyteoffset have been updated accordingly.  That means our upload function has to be smart enough to handle both new requests and existing multipart uploads.  That means for new requests we should open existing files and only write data after the data already in them, whereas we will create a file and start at the beginning for new uploads. Luckily, both can be accomplished by opening with the same code. First open file in append mode,  then ‘seek’ to the end of the current data (in this case we are relying on the seek offset to be provided by dropzone.)

@blueprint.route('/upload', methods=['POST'])
def upload():
    # Remember the paramName was set to 'file', we can use that here to grab it
    file = request.files['file']

    # secure_filename makes sure the filename isn't unsafe to save
    save_path = os.path.join(config.data_dir, secure_filename(file.filename))

    # We need to append to the file, and write as bytes
    with open(save_path, 'ab') as f:
        # Goto the offset, aka after the chunks we already wrote 
        f.seek(int(request.form['dzchunkbyteoffset']))
        f.write(file.stream.read())
       
    # Giving it a 200 means it knows everything is ok
    return make_response(('Uploaded Chunk', 200))

At this point you should have a working upload script, tada!

But lets beef this up a little bit. The following code improvements make it so we don’t overwrite existing files that have already been uploaded, checks the file size matches what we expect when we’re done, and gives a little more output along the way.

@blueprint.route('/upload', methods=['POST'])
def upload():
    file = request.files['file']

    save_path = os.path.join(config.data_dir, secure_filename(file.filename))
    current_chunk = int(request.form['dzchunkindex'])

    # If the file already exists it's ok if we are appending to it,
    # but not if it's new file that would overwrite the existing one
    if os.path.exists(save_path) and current_chunk == 0:
        # 400 and 500s will tell dropzone that an error occurred and show an error
        return make_response(('File already exists', 400))

    try:
        with open(save_path, 'ab') as f:
            f.seek(int(request.form['dzchunkbyteoffset']))
            f.write(file.stream.read())
    except OSError:
        # log.exception will include the traceback so we can see what's wrong 
        log.exception('Could not write to file')
        return make_response(("Not sure why,"
                              " but we couldn't write the file to disk", 500))

    total_chunks = int(request.form['dztotalchunkcount'])

    if current_chunk + 1 == total_chunks:
        # This was the last chunk, the file should be complete and the size we expect
        if os.path.getsize(save_path) != int(request.form['dztotalfilesize']):
            log.error(f"File {file.filename} was completed, "
                      f"but has a size mismatch."
                      f"Was {os.path.getsize(save_path)} but we"
                      f" expected {request.form['dztotalfilesize']} ")
            return make_response(('Size mismatch', 500))
        else:
            log.info(f'File {file.filename} has been uploaded successfully')
    else:
        log.debug(f'Chunk {current_chunk + 1} of {total_chunks} '
                  f'for file {file.filename} complete')

    return make_response(("Chunk upload successful", 200))

Now lets give this a try:

[DEBUG] pydrop: Chunk 1 of 6 for file DSC_0051-1.jpg complete
[DEBUG] pydrop: Chunk 2 of 6 for file DSC_0051-1.jpg complete
[DEBUG] pydrop: Chunk 3 of 6 for file DSC_0051-1.jpg complete
[DEBUG] pydrop: Chunk 4 of 6 for file DSC_0051-1.jpg complete
[DEBUG] pydrop: Chunk 5 of 6 for file DSC_0051-1.jpg complete
[INFO] pydrop: File DSC_0051-1.jpg has been uploaded successfully

Sweet! But wait, what if we remove the directories where the files are stored? Or try to upload the same file again?

(Dropzone’s text out of the box is a little hard to read, but it says “File already exists” on the left and “Not sure why, but we couldn’t write file the disk” on the right. Exactly what we’d expect.)

2018-05-28 14:29:19,311 [ERROR] pydrop: Could not write to file
Traceback (most recent call last):
    ....
FileNotFoundError: [Errno 2] No such file or directory:

We get error message on the webpage and in the logs, perfect.

I hope you found this information useful and if you have any suggestions on how to improve it, please let me know!

Thinking further down the road

In the long-term I would have a database or some permanent storage option to keep track of file uploads. That way you could see if one fails or stops halfway and be able to remove incomplete ones. I would also base saving files first into a temp directory based off their UUID then, when complete, moving them to a place based off their file hash. Would also be nice to have a page to see everything uploaded and manage directories or other options, or even password protected uploads.