#434: Most of OpenAI’s tech stack runs on Python

Python Bytes

Topics covered in this episode:

  • Making PyPI’s test suite 81% faster
  • People aren’t talking enough about how most of OpenAI’s tech stack runs on Python
  • PyCon Talks on YouTube
  • Optimizing Python Import Performance
  • Extras
  • Joke
Watch on YouTube

About the show

Sponsored by Digital Ocean: pythonbytes.fm/digitalocean-gen-ai Use code DO4BYTES and get $200 in free credit

Connect with the hosts

  • Michael: @mkennedy@fosstodon.org / @mkennedy.codes (bsky)
  • Brian: @brianokken@fosstodon.org / @brianokken.bsky.social
  • Show: @pythonbytes@fosstodon.org / @pythonbytes.fm (bsky)

Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Monday at 10am PT. Older video versions available there too.

Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it.

Brian #1: Making PyPI’s test suite 81% faster

  • Alexis Challande
  • The PyPI backend is a project called Warehouse
  • It’s tested with pytest, and it’s a large project, thousands of tests.
  • Steps for speedup
    • Parallelizing test execution with pytest-xdist
      • 67% time reduction
      • --numprocesses=auto allows for using all cores
      • DB isolation - cool example of how to config postgress to give each test worker it’s on db
      • They used pytest-sugar to help with visualization, as xdist defaults to quite terse output
    • Use Python 3.12’s sys.monitoring to speed up coverage instrumentation
      • 53% time reduction
      • Nice example of using COVERAGE_CORE=sysmon
    • Optimize test discovery
      • Always use testpaths
      • Sped up collection time. 66% reduction (collection was 10% of time)
      • Not a huge savings, but it’s 1 line of config
    • Eliminate unnecessary imports
      • Use python -X importtime
      • Examine dependencies not used in testing.
      • Their example: ddtrace
        • A tool they use in production, but it also has a couple pytest plugins included
        • Those plugins caused ddtrace to get imported
        • Using -p:no ddtrace turns off the plugin bits
  • Notes from Brian:
    • I often get questions about if pytest is useful for large projects.
    • Short answer: Yes!
    • Longer answer: But you’ll probably want to speed it up
    • I need to extend this article with a general purpose “speeding up pytest” post or series.
    • -p:no can also be used to turn off any plugin, even builtin ones.
      • Examples include
        • nice to have developer focused pytest plugins that may not be necessary in CI
        • CI reporting plugins that aren’t needed by devs running tests locally

Michael #2: People aren’t talking enough about how most of OpenAI’s tech stack runs on Python

  • Original article: Building, launching, and scaling ChatGPT Images
  • Tech stack: The technology choices behind the product are surprisingly simple; dare I say, pragmatic!
    • Python: most of the product’s code is written in this language.
    • FastAPI: the Python framework used for building APIs quickly, using standard Python type hints. As the name suggests, FastAPI’s strength is that it takes less effort to create functional, production-ready APIs to be consumed by other services.
    • C: for parts of the code that need to be highly optimized, the team uses the lower-level C programming language
    • Temporal: used for asynchronous workflows and operations inside OpenAI. Temporal is a neat workflow solution that makes multi-step workflows reliable even when individual steps crash, without much effort by developers. It’s particularly useful for longer-running workflows like image generation at scale

Michael #3: PyCon Talks on YouTube

  • Some talks that jumped out to me:
    • Keynote by Cory Doctorow
    • 503 days working full-time on FOSS: lessons learned
    • Going From Notebooks to Scalable Systems
      • And my Talk Python conversation around it. (edited episode pending)
    • Unlearning SQL
    • The Most Bizarre Software Bugs in History
    • The PyArrow revolution in Pandas
      • And my Talk Python episode about it.
    • What they don't tell you about building a JIT compiler for CPython
      • And my Talk Python conversation around it (edited episode pending)
    • Design Pressure: The Invisible Hand That Shapes Your Code
    • Marimo: A Notebook that "Compiles" Python for Reproducibility and Reusability
      • And my Talk Python episode about it.
    • GPU Programming in Pure Python
      • And my Talk Python conversation around it (edited episode pending)
    • Scaling the Mountain: A Framework for Tackling Large-Scale Tech Debt

Brian #4: Optimizing Python Import Performance

  • Mostly pay attention to #'s 1-3
  • This is related to speeding up a test suite, speeding up necessary imports.
  • Finding what’s slow
    • Use python -X importtime <the reset of the command
    • Ex: python -X importtime ptyest
    </

Melde dich an, um anstößige Folgen anzuhören.

Bleib auf dem Laufenden mit dieser Sendung

Melde dich an oder registriere dich, um Sendungen zu folgen, Folgen zu sichern und die neusten Updates zu erhalten.

Wähle ein Land oder eine Region aus

Afrika, Naher Osten und Indien

Asien/Pazifik

Europa

Lateinamerika und Karibik

USA und Kanada