APScheduler + requests 遇到 OSError: [Errno 24] Too many open files 的問題

前陣子試著在 Python 上面定時跑 crawler 抓一些東西寫進資料庫。 因為是偏實驗性質,選擇用 PyPy (PyPy3) 而非 CPython (就是官方版本的 Python),然後搭著 Gunicorn (跑 Flask) + APScheduler + requests,專案的程式碼可以在 app/app.

Gea-Suan Lin's BLOG
Python package urllib.parse is used for parsing URLs, and unquote_to_bytes function within it is useful for URL decoding bytes-like objects. #python #urllib #unquote_to_bytes
The #Python module "#urllib" looks too much like "#Urlaub" which tells me I really need a #vacation. #Deutsch

Last time tonight, after some digging realised my code could a little better for getting the path for the Cover Art into a useable format for Mastodon.py

For some reason, #VLC insists on passing a URI around, whereas #Clementine just passes a filesystem path.

Anyway, sorted out now, so that in the VLC version of nowplaying, I unquote a urlparse'd path and throw that at Mastodon.

#urllib

Pythonのurllibの脆弱性情報(CVE-2020-8492) | OSS脆弱性ブログ

01/30/2019にPythonのurllibの脆弱性情報(CVE-2020-8492)が公開されています。今回はこちらの脆弱性の概要と、各ディストリビューションの対応について簡単にまとめてみます。

Crawler de webs de LaNación y Página12 en busca de las palabras más usadas de los artículos de la portada.

Usando python3 con urllib para requests y beautifulsoup4 para parsear

#noticias #argentina #crawler #python #pagina12 #lanacion #macri #gato #urllib #beautifulsoup #beautifulsoup4 #datos #analisis