Usage

Installation

Use pip to install the latest stable version of toolforge:

python3 -m pip install --upgrade toolforge

The current development version is available on gitlab.wikimedia.org, and can be installed directly from the git repository:

python3 -m pip install --upgrade git+https://gitlab.wikimedia.org/toolforge-repos/python-toolforge.git

Connect to databases

The toolforge.connect() method simplifies connecting to a Wiki Replicas database. It will automatically read your tool’s database credentials from $HOME/replica.my.cnf and determine the correct host to connect to based on the provided database name.

import toolforge

conn = toolforge.connect("enwiki")  # You can also use "enwiki_p"
# conn is a pymysql.connection object.
with conn.cursor() as cur:
    cur.execute(query)  # Or something....

The toolforge.toolsdb() method provides similar functionality for databases hosted on tools.db.svc.wikimedia.cloud.

Please keep the connection handling policy in mind – web tools should create connections per request, not during application initialization.

Set policy compliant User-Agent

Set the default Requests user-agent to one that complies with the Wikimedia User-Agent policy:

import requests
import toolforge

toolforge.set_user_agent("mycooltool")
# Sets user-agent to:
# mycooltool (https://mycooltool.toolforge.org/; tools.mycooltool@toolforge.org) python-requests/2.28.2
requests.get("...")

For cases where the default Requests User-Agent is not used, the function also returns the string to use:

import mwapi
import toolforge

user_agent = toolforge.set_user_agent("mycooltool")
session = mwapi.Session("https://meta.wikimedia.org", user_agent=user_agent)
session.get(action="...")

Warning

set_user_agent()’s automatic application works by monkey patching the requests module. Any requests.Session object created before set_user_agent() was called will *not* automatically inherit the new User-Agent value.

Workarounds for this behavior include:

  • Calling set_user_agent() before importing an affected library.

    import toolforge
    
    toolforge.set_user_agent("...")
    import module_that_creates_session
    
  • Explicitly setting the requests.Session’s User-Agent header to the return value of your requests.utils.default_user_agent() call.

    user_agent = toolforge.set_user_agent("...")
    existing_session.headers["User-Agent"] = user_agent
    

Loading configuration files

To load configuration files with potentially sensitive information (e.g. OAuth credentials), you can use the toolforge.assert_private_file() decorator to wrap any other “load”-like function, for example from the standard library json module:

import json
import toolforge

with open("config.json", "r") as f:
    config = toolforge.assert_private_file(json.load)(f)

This will ensure that the config file is not world-readable, and raise a toolforge.exceptions.PrivateFileWorldReadableError if it is. In that case, you should recreate the config file and replace all secrets in it.

mv config.yaml config.yaml.leaked
install -m600 /dev/null config.yaml
# edit config.yaml

If you use a YAML configuration file and install the PyYAML library, you can also use the toolforge.load_private_yaml() function directly:

import toolforge

with open("config.yaml", "r") as f:
    config = toolforge.load_private_yaml(f)

Note that PyYAML is not a dependency of the toolforge library, and the function will only be available if PyYAML is otherwise installed. You should add it to your pyproject.toml, requirements.txt or other dependency list if you want to use it.

In a Flask tool, you can load the configuration like this:

app.config.from_file("config.yaml", load=toolforge.load_private_yaml)