Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate API calls to facilitate PreTalx data analysis #8

Open
hamelin opened this issue Mar 28, 2024 · 2 comments
Open

Integrate API calls to facilitate PreTalx data analysis #8

hamelin opened this issue Mar 28, 2024 · 2 comments
Assignees

Comments

@hamelin
Copy link

hamelin commented Mar 28, 2024

I have figured out some tooling to grab data from submissions (proposals) and reviews out of PreTalx, which may be useful for assigning proposal reviews to volunteers.

First, both relevant PreTalx APIs authenticate using a fixed token already assigned to each user. Fetch it from the user profile page, scroll down to the API Access heading.

Second, both APIs are streaming: each invocation returns a subset of the sequence of either submissions or reviews, along with a URL whose GET fetches the next page. The following function articulates the logic to pull the whole stream associated to one of these two API endpoints:

from contextlib import closing
import requests
from tqdm.auto import tqdm

def fetch_sequence(url1, token, max_queries=50):
    sequence = []
    url = url1
    max_queries = 50
    num_queries = 0
    num_results_expected = None

    with closing(tqdm(total=max_queries)) as progress:
        while True:
            response = requests.get(url, headers={"Authorization": f"Token {token}"})
            assert response.ok
            data = response.json()
            progress.update()
            num_queries += 1

            assert "results" in data
            assert "next" in data

            if num_results_expected is None and "count" in data:
                num_results_expected = data["count"]
                max_queries = int(np.ceil(num_results_expected / len(data["results"])))
                progress.reset(max_queries)
                progress.update(num_queries)
            else:
                assert num_results_expected == data["count"]

            sequence += data["results"]
            url = data["next"]
            if not url:
                break

    return sequence

The endpoints in question:

  1. Submissions: https://cfp.scipy.org/api/events/2024/submissions/
  2. Reviews: https://cfp.scipy.org/api/events/2024/reviews/

Easy peasy!

@hamelin
Copy link
Author

hamelin commented Mar 28, 2024

@matthewfeickert here is the relevant API knowledge I hacked together.

@matthewfeickert
Copy link
Contributor

Thanks @hamelin! I'm going to step through this on Friday, and I'll tag both you and @guenp for a PR review once I think I know what I'm doing. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants