#434: Most of OpenAI’s tech stack runs on Python

Python Bytes

Topics covered in this episode:

  • Making PyPI’s test suite 81% faster
  • People aren’t talking enough about how most of OpenAI’s tech stack runs on Python
  • PyCon Talks on YouTube
  • Optimizing Python Import Performance
  • Extras
  • Joke
Watch on YouTube

About the show

Sponsored by Digital Ocean: pythonbytes.fm/digitalocean-gen-ai Use code DO4BYTES and get $200 in free credit

Connect with the hosts

  • Michael: @mkennedy@fosstodon.org / @mkennedy.codes (bsky)
  • Brian: @brianokken@fosstodon.org / @brianokken.bsky.social
  • Show: @pythonbytes@fosstodon.org / @pythonbytes.fm (bsky)

Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Monday at 10am PT. Older video versions available there too.

Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it.

Brian #1: Making PyPI’s test suite 81% faster

  • Alexis Challande
  • The PyPI backend is a project called Warehouse
  • It’s tested with pytest, and it’s a large project, thousands of tests.
  • Steps for speedup
    • Parallelizing test execution with pytest-xdist
      • 67% time reduction
      • --numprocesses=auto allows for using all cores
      • DB isolation - cool example of how to config postgress to give each test worker it’s on db
      • They used pytest-sugar to help with visualization, as xdist defaults to quite terse output
    • Use Python 3.12’s sys.monitoring to speed up coverage instrumentation
      • 53% time reduction
      • Nice example of using COVERAGE_CORE=sysmon
    • Optimize test discovery
      • Always use testpaths
      • Sped up collection time. 66% reduction (collection was 10% of time)
      • Not a huge savings, but it’s 1 line of config
    • Eliminate unnecessary imports
      • Use python -X importtime
      • Examine dependencies not used in testing.
      • Their example: ddtrace
        • A tool they use in production, but it also has a couple pytest plugins included
        • Those plugins caused ddtrace to get imported
        • Using -p:no ddtrace turns off the plugin bits
  • Notes from Brian:
    • I often get questions about if pytest is useful for large projects.
    • Short answer: Yes!
    • Longer answer: But you’ll probably want to speed it up
    • I need to extend this article with a general purpose “speeding up pytest” post or series.
    • -p:no can also be used to turn off any plugin, even builtin ones.
      • Examples include
        • nice to have developer focused pytest plugins that may not be necessary in CI
        • CI reporting plugins that aren’t needed by devs running tests locally

Michael #2: People aren’t talking enough about how most of OpenAI’s tech stack runs on Python

  • Original article: Building, launching, and scaling ChatGPT Images
  • Tech stack: The technology choices behind the product are surprisingly simple; dare I say, pragmatic!
    • Python: most of the product’s code is written in this language.
    • FastAPI: the Python framework used for building APIs quickly, using standard Python type hints. As the name suggests, FastAPI’s strength is that it takes less effort to create functional, production-ready APIs to be consumed by other services.
    • C: for parts of the code that need to be highly optimized, the team uses the lower-level C programming language
    • Temporal: used for asynchronous workflows and operations inside OpenAI. Temporal is a neat workflow solution that makes multi-step workflows reliable even when individual steps crash, without much effort by developers. It’s particularly useful for longer-running workflows like image generation at scale

Michael #3: PyCon Talks on YouTube

  • Some talks that jumped out to me:
    • Keynote by Cory Doctorow
    • 503 days working full-time on FOSS: lessons learned
    • Going From Notebooks to Scalable Systems
      • And my Talk Python conversation around it. (edited episode pending)
    • Unlearning SQL
    • The Most Bizarre Software Bugs in History
    • The PyArrow revolution in Pandas
      • And my Talk Python episode about it.
    • What they don't tell you about building a JIT compiler for CPython
      • And my Talk Python conversation around it (edited episode pending)
    • Design Pressure: The Invisible Hand That Shapes Your Code
    • Marimo: A Notebook that "Compiles" Python for Reproducibility and Reusability
      • And my Talk Python episode about it.
    • GPU Programming in Pure Python
      • And my Talk Python conversation around it (edited episode pending)
    • Scaling the Mountain: A Framework for Tackling Large-Scale Tech Debt

Brian #4: Optimizing Python Import Performance

  • Mostly pay attention to #'s 1-3
  • This is related to speeding up a test suite, speeding up necessary imports.
  • Finding what’s slow
    • Use python -X importtime <the reset of the command
    • Ex: python -X importtime ptyest
    </

To listen to explicit episodes, sign in.

Stay up to date with this show

Sign in or sign up to follow shows, save episodes and get the latest updates.

Select a country or region

Africa, Middle East, and India

Asia Pacific

Europe

Latin America and the Caribbean

The United States and Canada