Exporting GitHub Projects (beta) with GraphQL and Python

GitHub recently launched a beta for their new project boards, which provide a lot more flexibility than the old Kanban boards. I've been excited about the new functionality ever since I heard of the 'Memex' project.

With the new boards comes a new API, and like Discussions before it there's no REST API as GitHub continues the journey into it's v4 GraphQL API. So things are a little different, but they can also be made to work much the same as before.

I previously wrote about Planning Poker, without Copy/Paste where we used some Python scripts 'dump_cards' to scrape data from the REST API. The key functions of those scripts were refactored into an atdumpcards.py module, leaving the scripts themselves as thin wrappers. To work with Projects(beta) we needed a new module atdumpmemex.py to construct GraphQL queries and parse the responses. The rest of this post will walk through that module to explain what's going on...

#!/usr/bin/env python3
import base64, json, os, requests

One more import than before for base64, which we'll come to later... Requests is used to send a request to the API, the response contains JSON, and the OS is being used to write a (.csv) file.

# Color constants
# Reference: https://gist.github.com/chrisopedia/8754917
COLERR="\033[0;31m"
COLINFO="\033[0;35m"
COLRESET="\033[m"

For colourful error messages.

graphqlurl = 'https://api.github.com/graphql'
token = os.environ['GITHUB_API_TOKEN']
headers = {"Content-Type": "application/json", 
    "Accept": "application/json",
    "Authorization": "Bearer " + token,
    "GraphQL-Features": "projects_next_graphql" }

This is where the GraphQL changes kick in. There's a single API end point for all queries. The authentication token also needs to be passed as a Bearer Token, and the GraphQL features for 'Project Next' need to be enabled.

def list_memex_projects(org):
    query = ('query{ organization(login: \\"' + org + '\\") '
        '{ projectsNext(first: 20) { nodes { id title } } } }')
    response = requests.post(graphqlurl, 
        headers=headers, 
        data='{"query": '+'\"' + query + '\"}')
    if response.status_code != 200:
        # An error occured
        print(COLERR + "Error getting project list : "
            + str(response.status_code) + " " + response.text + COLRESET)

This function lists the projects in a given org, and starts by constructing a GraphQL query to to just that. The query is filtering by organization based on login, and then retrieving the first 20 projects with the ID and title for each. That query is then POSTed to the API end point, and the response is captured. If the response code is something other than 200 then there's a problem and an error is output.

    json_projects = json.loads(response.text)
    for node in json_projects["data"]["organization"]["projectsNext"]["nodes"]:
        project_id = base64.b64decode(node["id"]).decode("utf-8")
        print(f'{project_id}  {node["title"]}')

A json_projects dictionary is then created from loading the JSON contained in the text of the API request. The JSON produced by the GraphQL API is quite deeply nested, so the function iterates over the nodes contained in the data:organization:projectsNext:nodes part of the response.

The GraphQL IDs look something like MDEyOlByb2plY3ROZXh0MzQ1Ng== which is a base64 encoded representation of 012:ProjectNext3456. The latter is easier on the eye, so the function decodes the ID into bytes, and decodes those into a string. The ID and title for each project can then be printed to the terminal.

def list_memex_columns(project_id):
    b64id = base64.b64encode(project_id.encode("ascii")).decode("utf-8")
    query = ('query{ node(id: \\"' + b64id + '\\")  '
        '{ ... on ProjectNext { fields(first: 20) '
        '{ nodes { id name settings } } } } }')

The list_memex_columns function works along similar lines. Except this time it's constructing a query to get the ID, name and settings from a given project.

    for node in json_nodes["data"]["node"]["fields"]["nodes"]:
        if node["name"] == "Status":
            json_status = json.loads(node["settings"])
            for options in json_status["options"]:
                print(f'{options["id"]} {options["name"]}')

Within settings it then drills down for nodes named 'Status' to get the JSON payload that contains the column IDs and names.

def list_memex_cards(column_id, project_id):
    cards_file = column_id + ".csv"
    b64id = base64.b64encode(project_id.encode("ascii")).decode("utf-8")
    query = ('query{ node(id: \\"' + b64id + '\\") '
        '{ ... on ProjectNext { items(first: 100) '
        '{ nodes{ title fieldValues(first: 8) { nodes{ value } } '
        'content{ ...on Issue { number labels(first: 50) '
        '{ nodes{ name } } } } } } } } }')

With the ID in hand a query can then be used to grab the issues associated with a column.

Unfortunately the GraphQL API doesn't yet provide for filtering within queries, so it has to be done in the script:

    json_cards = json.loads(response.text)
    f = open(cards_file, 'w')
    f.write("Issue Key,Summary,Description,Acceptance Criteria,Story Points\n")
    for card in json_cards["data"]["node"]["items"]["nodes"]:
        for status in card["fieldValues"]["nodes"]:
            if status["value"] == column_id:
                f.write(f'{card["content"]["number"]},{card["title"]},,,')
                for label in card["content"]["labels"]["nodes"]:
                        if (label["name"][-2:]=="SP"):
                            f.write (f'{label["name"].partition(" ")[0]}')
                            # break loop in case there are multiple SP labels
                            break
                f.write ('\n')
    f.close

The function uses a set of nested loops and conditionals to first extract just the issues in the desired column, and then any story points from labels on those issues. The extracted data is written out to a .csv file that can then be imported into Planning Poker.