HTTP: requests on steroids

The HTTP module is implemented on top of requests, but provides additional features. Refer to tenlib.http's documentation for details.

Example comparison requests/ten

Let's say you want to log in as an administrator on a Drupal website. You want to proxy the traffic through Burp to check if everything works correctly. With requests, you'd have the following script:

import requests
import re

URL = 'http://site.com'

def main():
    session = Session()
    session.verify = False

    session.proxies = {'http': 'localhost:8080', 'https': 'localhost:8080'}

    # GET request to get CSRF token, form ID, etc.

    response = session.get(URL + '/user/login')

    form_build_id = re.search('name="form_build_id" value="(.*?)"', response.text).group(1)
    form_token = re.search('name="form_token" value="(.*?)"', response.text).group(1)
    form_id = re.search('name="form_id" value="(.*?)"', response.text).group(1)

    # Log in

    response = session.post(
        URL + '/user/login',
        data={
            'user': 'someone',
            'pass': 'password1!'
            'form_build_id': form_build_id,
            'form_token': form_token,
            'form_id': form_id,
        }
    }

    if response.status_code != 200:
        print('Unable to log in')
        exit()

    if 'Welcome, admin' not in response.text:
        print('User is not an administrator')
        exit()

    print('Login successful !')

main()

With ten, you'd have:

from ten import *

URL = 'http://site.com'

@entry
def main():
    session = ScopedSession(URL)
    session.burp()

    response = session.get('/user/login')

    form = response.form(id='user-login')
    form.update({
        'user': 'someone',
        'pass': 'password1!'
    })
    response = form.submit()

    if not response.code(200):
        failure('Unable to log in')

    if not response.contains('Welcome, admin'):
        failure('User is not an administrator')

    msg_success('Login successful !')


main()

Faster, and more readable. But that's not all the http module can do.

Session

Creating a standard session and issuing HTTP requests

Create a session like so:

session = Session()

The API is the same as requests.Session's API.

# Some GET request
response = session.get('https://site.com/', headers={...}, ...)
# Some POST request
response = session.post('https://site.com/user/login', data={...}, ...)

Creating a scoped session

When you're bound to send several requests to the same website, you often end up having to concat the base URL with the path. Instead, you can use a ScopedSession:

session = ScopedSession('http://target.com/admin')

You'd call methods like this:

# GET http://target.com/admin/login
response = await session.get('/login')

If you request something that is out of scope, it'll raise an exception:

# raises HTTPOutOfScopeError
response = await session.get('http://target.com/user')

Setting a proxy

The standard requests API requires you to set proxies as a dictionary. Now, a string suffices:

session.proxies = "socks5://localhost:8888"

Proxying through Burp

If you need to debug some requests, you can call Session.burp() to set the proxy to localhost:8080.

session.burp()

When you're done, unset it like so:

session.unburp()

Raw URLs

By default, URL's path is not re-evaluated by ten, allowing the use of non-canonical URLs, or un-encoded URLs:

# GET /portal/../admin?param=<xss> HTTP/1.1
response = session.get("https://target.com/portal/../admin?param=<xss>")

To go back to the requests behaviour, were the URL is canonicalized and GET parameters are re-encoded, set raw_url to False.

session.raw_url = False
# GET /admin?param=%3Cxss%3E HTTP/1.1
response = session.get("https://target.com/portal/../admin?param=<xss>")

HTTP Responses

Upon receiving an HTTP response, one will generally make sure they are OK, using the HTTP status code or looking for keywords in the contents, and then parse their contents to extract data.

Status code

Comparing the HTTP status code with several:

if response.code(200, 302):
    ...

Exiting when an unexpected code happens:

response.expect(200)

Text matching

Use Response.contains() to quickly check for keywords, as string or bytes:

if response.contains('login successful'):
    ...
if response.contains(b'login successful'):
    ...

Regular expressions

Every response object contains a re property that has the same API as the re module. It handles both str and bytes.

match = response.re.search(r'token:([0-9]+)')
changed = response.re.sub(
    br'\x00\x00\x7f.{5}', b''
)

BeautifulSoup: checking the DOM

A BeautifulSoup object is available as response.soup:

p_tags = response.soup.find('p')

In addition, select() and select_one() are access elements using CSS selectors:

token = response.select_one('input[name="token"]').attrs["value"]

Forms

Getting a form

Use the Response.form method to extract a form from a response:

login_form = response.form(id="user-login")

Any combination of HTML attributes can be used to select the form:

login_form = response.form(action="/user/login", method="POST")

The form method returns a Form object, which contains the form's data, and can be used to submit the form.

Form(
    action='https://www.drupal.org/user/login',
    method='post',
    data={
        'name': '',
        'pass': '',
        'form_build_id': 'form-b2WxheXaaeswzS13Ypq5YhAWMJLRk8-fs_xT9VMceXw',
        'form_token': '9l0gj6ZY1OJBZ9I9ZKWvaNetevSRw2e5dHycs7SBzPs',
        'form_id': 'user_login',
        'op': 'Log in'
    }
)

Setting form values

Form values can be read/written as a dict:

csrf_token = form["token"]
form["user"] = "test@yopmail.com"
form["password"] = "Password123!"

or using the update() method:

form.update({"user": "test@yopmail.com", "password": "Password123!"})

The underlying dictionary is stored in form.data.

Sending the form

You can then submit the form:

response = form.submit()

Multi, First, Pool: Send concurrent requests

Multi: run all concurrent requests

Oftentimes, you'll need to send multiple requests at the same time. For example, when you're fuzzing a parameter, you'll want to send a request for each payload. ten provides helpers to do so, easily, and in a readable way.

Say you want to retrieve the first 10 news from a website. You could do it like so:

session = ScopedSession("https://target.com/")
responses = [session.get(f"/news/{id}") for id in range(10)]

However, requests are done one after the other, which is inefficient. Using Multi, you can retrieve them concurrently:

session = ScopedSession("https://target.com/")
responses = session.multi().get(Multi(f"/news/{id}" for id in range(10)))

The Multi keyword can be anywhere in the call. If you want to issue POST requests to /api/news, with news_id being 0 to 9, you can do:

session = ScopedSession("https://target.com/")
responses = session.multi().post("/api/news", data={"news_id": Multi(range(10))})

Even better, you can use several Multi keywords:

# Get news for each month and day of the year 2023
session = ScopedSession("https://target.com/")
responses = session.multi().post(
    "/api/news",
    data={
        "year": 2023,
        "month": Multi(range(1, 13)),
        "day": Multi(range(1, 32))
    }
)

This code would produce 12 * 31 requests, all done concurrently.

First: stop as soon as one request succeeds

Session.multi() will run every request to completion. In some cases, you might want to stop as soon as one request succeeds. For example, when you're fuzzing a parameter, you might want to stop as soon as you find a working payload. You can do so by using Session.first().

def news_exists(r: Response):
    return r.code(200) and r.contains("<title>News id")

first_news = session.first(news_exists).get(Multi(f"/news/{id}" for id in range(10)))

This code runs requests concurrently until one matches the news_exists predicate. It then returns the first response that matches, and cancel the other requests.

Pool: advanced concurrency

For more advanced usage, you can use Session.pool(), which produces a Pool object. Pool objects run requests concurrently, and allow you to queue requests, and retrieve responses as they come.

with session.pool() as pool:
    # Queue 10 requests
    for id in range(10):
        pool.get(f"/news/{id}")
    # Retrieve all responses, in order
    responses = pool.in_order()

In addition, you can get responses as they arrive using pool.as_completed():

with session.pool() as pool:
    # Queue 10 requests
    for id in range(10):
        pool.get(f"/news/{id}", tag=id)

    # Get responses, as they arrive
    for response in pool.as_completed():
        msg_info(f"Received {response.tag}: {response.status_code}")

The tag argument is optional, and can be used to identify responses.

As soon as you leave the with block, all pending requests are cancelled. Use this to keep only some of the responses, and cancel the requests you don't need:

with session.pool() as pool:
    for id in range(100):
        pool.get(f"/news/{id}", tag=id)

    for response in pool.as_completed():
        msg_info(f"Received {response.tag}: {response.status_code}")
        if response.code(200) and response.contains("News id"):
            break

# At this point, all pending requests have been cancelled
msg_success(f"Found a news with ID {response.tag}")

Pool support adding new items while being iterated upon. If you're building some kind of crawler, you might need to add new requests whenever you find directories. Here the sample code for a very simple crawler:

    s = ScopedSession(url)
    s.raw_url = False

    with s.pool() as pool:
        pool.get("/")
        done = set()
        for response in pool.as_completed():
            done.add(response.url)
            msg_info(response.url)

            # Directory: extract links and add them to the pool
            if response.contains("Index of "):
                urls = [a.attrs["href"] for a in response.select("a")]
                urls = {urljoin(response.url, u) for u in urls}
                urls = urls - done
                for url in urls:
                    if s.is_in_scope(url) and url.endswith('/'):
                        pool.get(url)
            # File
            else:
                # Save to disk ?
                ...