多应用+插件架构,代码干净,二开方便,首家独创一键云编译技术,文档视频完善,免费商用码云13.8K 广告
## 1. pip > 1. 再是使用pip安装第三方包时,回去https://pypi.python.org/pypi (源)去搜索,如果找到就会下载那个相关库对应的代码和依赖,本地编译完成以后,安装到本地的python的安装目录(一般为($(python安装目录)\lib\site-packages))。 > 2. 如果想要自己的python项目可以被别人pip,可以去pypi上注册账号,然后发布自己的项目 > 3. 可以被pip的项目,要求有自己的setup.py文件,声明软件包的信息,该软件包所需要的依赖包等 > 下面是scrapy在pypi上的信息 ![](https://box.kancloud.cn/7165e892249b7df30378c1e7d03fa7ce_1766x1010.png) ## 2. scrapy setup.py ~~~ from os.path import dirname, join from pkg_resources import parse_version from setuptools import setup, find_packages, __version__ as setuptools_version with open(join(dirname(__file__), 'scrapy/VERSION'), 'rb') as f: version = f.read().decode('ascii').strip() def has_environment_marker_platform_impl_support(): """Code extracted from 'pytest/setup.py' https://github.com/pytest-dev/pytest/blob/7538680c/setup.py#L31 The first known release to support environment marker with range operators it is 18.5, see: https://setuptools.readthedocs.io/en/latest/history.html#id235 """ return parse_version(setuptools_version) >= parse_version('18.5') extras_require = {} if has_environment_marker_platform_impl_support(): extras_require[':platform_python_implementation == "PyPy"'] = [ 'PyPyDispatcher>=2.1.0', ] ~~~ 构建设置: 最重要的是entry_points,指定了一个命令与函数的对应关系,在爬虫的时候,使用scrapy 命令其实就是调用scrapy.cmdline的execute方法,对应setup的 ~~~ entry_points={ 'console_scripts': ['scrapy = scrapy.cmdline:execute'] }, ~~~ ~~~ setup( name='Scrapy', version=version, url='https://scrapy.org', # 可以设置为GitHub地址 description='A high-level Web Crawling and Web Scraping framework', long_description=open('README.rst').read(), author='Scrapy developers', maintainer='Pablo Hoffman', maintainer_email='pablo@pablohoffman.com', license='BSD', packages=find_packages(exclude=('tests', 'tests.*')), include_package_data=True, zip_safe=False, entry_points={ 'console_scripts': ['scrapy = scrapy.cmdline:execute'] }, ~~~ 这个参数是用来指定一个软件包的分类、许可证、允许运行的操作系统、允许运行的Python的版本的信息。你可以在PyPI上找到完整的classifier值列表,地址:https://pypi.python.org/pypi?%3Aaction=list_classifiers 。 另外,你也可以通过setuptools的命令来获取这个列表,在项目根目录下执行:Python setup.py register --list-classifiers。 ~~~ classifiers=[ 'Framework :: Scrapy', 'Development Status :: 5 - Production/Stable', 'Environment :: Console', 'Intended Audience :: Developers', 'License :: OSI Approved :: BSD License', 'Operating System :: OS Independent', 'Programming Language :: Python', 'Programming Language :: Python :: 2', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.4', 'Programming Language :: Python :: 3.5', 'Programming Language :: Python :: 3.6', 'Programming Language :: Python :: Implementation :: CPython', 'Programming Language :: Python :: Implementation :: PyPy', 'Topic :: Internet :: WWW/HTTP', 'Topic :: Software Development :: Libraries :: Application Frameworks', 'Topic :: Software Development :: Libraries :: Python Modules', ], python_requires='>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*', ~~~ ~~~ # 声明scrapy所需依赖,pip install 时会进行自动下载安装这些依赖 install_requires=[ 'Twisted>=13.1.0', 'w3lib>=1.17.0', 'queuelib', 'lxml', 'pyOpenSSL', 'cssselect>=0.9', 'six>=1.5.2', 'parsel>=1.4', 'PyDispatcher>=2.0.5', 'service_identity', ], extras_require=extras_require, ) ~~~ 爬虫的开端,使爬虫可运行