首页  思维导图  详情



 



Python

2020-12-14 14:58:59   1  举报





AI智能生成

python

模板推荐

作者其他创作

大纲/内容

Linux操作相关

文件与目录

相对路径和绝对路径 .表示当前目录 ..表示父目录

pwd

cd - 快速回到上次的目录 cd ~ 回到home目录

ls -lh ls *.txt

clear清屏

history历史命令

--help：ls --help

touch创建文件

权限rwx

分支主题

文件与磁盘管理

通配符

| #管道符，或者（正则） &gt; #输出重定向 &gt;&gt; #输出追加重定向 &lt; #输入重定向 &lt;&lt; #追加输入重定向 ~ #当前用户家目录 `` $() #引用命令被执行后的结果 $ #以。。。结尾（正则） ^ #以。。。开头（正则） * #匹配全部字符，通配符？ #任意一个字符，通配符 # #注释 & #让程序或脚本切换到后台执行 && #并且同时成立 [] #表示一个范围（正则，通配符） {} #产生一个序列（通配符） . #当前目录的硬链接 .. #上级目录的硬链接

more分屏显示

最基本的指令就是按空白键（space）就往下一页显示，按 b 键就会往回（back）一页显示。

管道|

管道符左边命令的输出就会作为管道符右边命令的输入。例：cat 1.txt | grep consists

重定向

&gt;是重定向到一个文件，&gt;&gt;是追加内容到文件。两个命令都是如果文件不存在则创建文件

文本搜索：grep

grep一般格式为： grep [-选项] ‘搜索内容串’文件名

分支主题

查看或者合并文件内容：cat

cat 1.txt 2.txt &gt; 3.txt

查找文件：find

分支主题

解压：tar

压缩打包：tar zxvf 压缩包包名。解压到指定目录：-C （大写字母“C”）

文件打包：tar使用格式 tar [参数] 打包文件名文件参数含义 -c 生成档案文件，创建打包文件 -v 列出归档解档的详细过程，显示进度 -f 指定档案文件名称，f后面一定是.tar文件，所以必须放选项最后 -t 列出档案中包含的文件 -x 解开档案文件

文件压缩解压：gzip

tar只负责打包文件，但不压缩，用gzip压缩tar打包后的文件，其扩展名一般用xxxx.tar.gz。gzip使用格式如下： gzip [选项] 被压缩文件常用选项：选项含义 -d 解压 -r 压缩所有子目录

文件压缩解压：zip、unzip

通过zip压缩文件的目标文件不需要指定扩展名，默认扩展名为zip。压缩文件：zip [-r] 目标文件(没有扩展名) 源文件解压文件：unzip -d 解压后目录文件压缩文件

创建目录：mkdir

参数-p可递归创建目录。mkdir -p 创建目录，若无父目录，则创建父目录

删除目录：rmdir

必须离开目录，并且目录必须为空目录，不然提示删除失败。

删除文件或文件夹：rm

分支主题

rm 文件夹名称 -r：删除非空文件夹

建立链接文件：ln

如果没有-s选项代表建立一个硬链接文件，两个文件占用相同大小的硬盘空间，即使删除了源文件，链接文件还是存在，所以-s选项是更常见的形式。使用格式： ln 源文件链接文件【硬链接】 ln -s 源文件链接文件【软连接】

参考网站：https://www.cnblogs.com/luwenlong/articles/9025459.html

用户与权限管理

ssh:远程登录 ssh python@192.168.17.76 ssh 用户名@ip

Ubuntu的默认root密码是随机的，可以在终端输入命令 sudo passwd来修改root密码

useradd创建用户

设置用户密码：passwd 用户名

exit退出当前用户

useradd 用户名 -m:自动创建目录及用户

userdel -r 用户名：删除用户，包括目录

su切换用户

分支主题

su与su -命令的差别： su 只能切换到管理员用户权限,不使用管理员的登陆脚本和搜索路径 su - 不但能切换到管理员权限而且使用管理员登陆脚本和搜索路径

添加、删除组账号：groupadd、groupdel

使用用法： groupadd 新建组账号 groupdel删除组账号 cat /etc/group 查看用户组

修改用户所在组：usermod

使用方法：usermod -g 目标用户组当前用户名

修改文件权限：chmod

字母法：chmod u/g/o/a +/-/= rwx 文件

分支主题

chmod o+w file 给文件file的其它用户增加写权限 chmod u-r file 给文件file的拥有者减去读的权限 chmod g=x file设置文件file的同组用户的权限为可执行，同时去除读、写权限

数字法：“rwx” 这些权限也可以用数字来代替

分支主题

如执行：chmod u=rwx,g=rx,o=r filename 就等同于： chmod u=7,g=5,o=4 filename

如果想递归所有目录加上相同权限，需要加上参数“ -R ”。如：chmod 777 test/ -R 递归 test 目录下所有文件加 777 权限

修改文件所有者：chown

使用方法：chown 用户名文件或文件夹

修改文件所属组：chgrp

使用方法：chgrp 用户名文件或文件夹

系统管理

查看当前日历cal

显示或设置时间date

强制重启reboot -f

关机shutdown（选项）（参数）

查看进程信息【瞬间】 ps -aux

top 查看进程信息【动态】按q退出

kill -9 进程号强制终止进程

df -h 查看带有单位显示磁盘信息

编辑器与服务器

vim

x 应该是保存并退出，功能和:wq!相同

编辑模式

小i 行位置停止不动写入大I 行首写入小a 行当前字符后写入大A 行尾写入小o 下一行写入【另起一行】大O 上一行写入

镜像

源获取：https://mirror.tuna.tsinghua.edu.cn/help/ubuntu/

/etc/apt/source.list文件内容替换,sudu - 切换到超级管理员

更新apt-get update

sudo apt-get install/remove 安装或卸载软件名称

ftp服务器：文件传输协议

参考地址https://blog.csdn.net/qq_26442553/article/details/81411261

1安装：sudo apt-get install vsftpd 2配置vsftpd.conf文件：sudo vi /etc/vsftpd.conf 3重启服务：sudo /etc/init.d/vsftpd restart

lrzsz【windows和linux之间拷贝】

安装：sudo apt-get install lrzsz

rz（上传）、sz（下载）

ssh【linux服务器之间拷贝】

从服务器上下载文件 scp username@servername:/path/filename /var/www/local_dir（本地目录）举例：scp ubuntu@134.175.71.163:/tmp/setRps.log /home/leejuny/

上传本地文件到服务器 scp /path/filename username@servername:/path

从服务器下载整个目录 scp -r username@servername:/var/www/remote_dir/（远程目录） /var/www/local_dir（本地目录）

上传目录到服务器 scp -r local_dir username@servername:remote_dir

Python基础语法

python基础知识

问题

# encoding:utf-8 乱码

注释

单行注释#

多行注释‘’‘

print函数

print()函数语法格式： print(value,...,sep=&apos; &apos;,end=&apos;\n&apos;,file=sys.stdout,flush=False) sep:改变默认的空格分隔符，例：sep=&apos;|&apos;,用|分隔 end:默认值是&quot;\n&quot;换行，用&quot;&quot;则不会换行 file:指定print()函数的输出目标,默认值为sys.stdout,例： &gt;&gt;&gt; with open (&apos;haha.txt&apos;,&apos;w&apos;) as s: print(&quot;蓝田日暖玉生烟&quot;,file=s) flush:用于控制输出缓存，一般默认为False

变量及类型

type() 函数进行查看某个数据属于哪个数据类型；

分支主题

str()或repr()函数将数值转换成字符串

转义字符\可以把一个字符串写成两行 &gt;&gt;&gt; print(&apos;aaa\ eee&apos;)

原始字符串

原始字符串以“r&quot;开头,原始字符串不会把反斜杠当成特殊字符例：r&apos;G:\publish\codes\02&apos;

标识符和关键字

变量命名规则：识符必须以字母（大小写均可）或者&quot;_&quot;开头，接下来可以重复0到多次（字母|数字|&quot;_&quot;)，用于作为变量，函数名，类名，方法名等，不能包含空格

python关键字

&gt;&gt;&gt; import keyword &gt;&gt;&gt; keyword.kwlist [&apos;False&apos;, &apos;None&apos;, &apos;True&apos;, &apos;and&apos;, &apos;as&apos;, &apos;assert&apos;, &apos;async&apos;, &apos;await&apos;, &apos;break&apos;, &apos;class&apos;, &apos;continue&apos;, &apos;def&apos;, &apos;del&apos;, &apos;elif&apos;, &apos;else&apos;, &apos;except&apos;, &apos;finally&apos;, &apos;for&apos;, &apos;from&apos;, &apos;global&apos;, &apos;if&apos;, &apos;import&apos;, &apos;in&apos;, &apos;is&apos;, &apos;lambda&apos;, &apos;nonlocal&apos;, &apos;not&apos;, &apos;or&apos;, &apos;pass&apos;, &apos;raise&apos;, &apos;return&apos;, &apos;try&apos;, &apos;while&apos;, &apos;with&apos;, &apos;yield&apos;]

格式化输出

age = 18 name = &quot;xiaohua&quot; print(&quot;我的姓名是%s,年龄是%d&quot;%(name,age))

&gt;&gt;&gt; for i in range(10): print(i,end=&apos;**&apos;) #不换行，结尾后追加**

分支主题

转义字符

分支主题

\n：换行

延时

&gt;&gt;&gt; import time &gt;&gt;&gt; time.sleep(3)

输入

使用input获取用户输入： &gt;&gt;&gt; a=input(&apos;提示：&apos;) 提示：100+99 &gt;&gt;&gt; a &apos;100+99&apos;

运算符

算术运算 +-*/ %求余 **幂 //取整除，返回商的整数部分

比较运算：== != &lt;&gt; 不等于（是否不相等） &gt; &lt; &gt;= &lt;=

赋值运算：= += -= *= /= %= **= //=

逻辑运算：and or not

&gt;&gt;&gt; &quot;hello&quot;*3 &apos;hellohellohello&apos;

三目运算符

True_statements if expression else False_statements 例： print(&quot;a&gt;b&quot;) if a&gt;b else print(&quot;a&lt;=b&quot;) 嵌套： print(&quot;a&gt;b&quot;) if a&gt;b else print(&quot;a&lt;b&quot;) if a&lt;b else print(&quot;a=b&quot;)

成员运算in or not in

练习：开发敏感词语过滤程序，提示用户输入评论内容，如果用户输入的内容中包含特殊的字符：敏感词列表 li = [&quot;苍老师&quot;, &quot;东京热&quot;, &quot;武藤兰&quot;, &quot;波多野结衣&quot;] 则将用户输入的内容中的敏感词汇替换成等长度的*（苍老师就替换***），并添加到一个列表中；如果用户输入的内容没有敏感词汇，则直接添加到上述的列表中。 li = [&quot;苍老师&quot;, &quot;东京热&quot;, &quot;武藤兰&quot;, &quot;波多野结衣&quot;] content = &apos;你知道苍老师，波老师，东京热吗？&apos; for i in li: if i in content: content = content.replace(i, &apos;*&apos; * len(i)) print(content)

数据类型转换

int(x [,base]) 将x转换为一个整数 float(x) 将x转换到一个浮点数 complex(real [,imag]) 创建一个复数 str(x) 将对象 x 转换为字符串 repr(x) 将对象 x 转换为表达式字符串 eval(str) 用来计算在字符串中的有效Python表达式,并返回一个对象 tuple(s) 将序列 s 转换为一个元组 list(s) 将序列 s 转换为一个列表 set(s) 转换为可变集合 dict(d) 创建一个字典。d 必须是一个序列 (key,value)元组。 frozenset(s) 转换为不可变集合 chr(x) 将一个整数转换为一个字符 ord(x) 将一个字符转换为它的整数值 hex(x) 将一个整数转换为一个十六进制字符串 oct(x) 将一个整数转换为一个八进制字符串

cmd执行python程序例子：python E:\文件名.py

isinstance()函数，判断某个变量是否为指定类型的实例，isinstance(2,int)

判断语句和循环语句

if条件语句

if嵌套

if 表达式1: 语句 if 表达式2: 语句 elif 表达式3: 语句 else 语句 elif 表达式4: 语句 else: 语句

if..else

age=input(&quot;输入您的年龄：&quot;) if int(age)&gt;=18: print(&quot;您已经成年，可以。。&quot;) else: print(&quot;您是未成年，不可以。。&quot;)

在使用if..else分支时，一定要先处理包含范围更小的情形。

random库函数

random()函数，生成0到1的随机小数

randint(a,b)生成一个a到b的随即整数

import random print(random.randint(0,3))

False、None、&quot;&quot;、()、[]、{}做为bool表达式时，被当做False处理

assert断言

age=int(input(&quot;请输入您的年龄：&quot;)) assert 20&lt;age&lt;80 print(‘您的年龄正确’) #如果正确往下执行，否则发生Error

pass空语句

while循环

i=0 while i&lt;=10: print(&quot;我错了，对不起！&quot;) i+=1

i=1 sum=0 while i&lt;=100: sum+=i i+=1 print(&quot;....%d...&quot;%sum)

九九乘法表 i=1 while i&lt;=9: j=1 while i&gt;=j: print(&quot;%d*%d=%-2d &quot;%(j,i,i*j),end=&apos; &apos;) j+=1 print(&quot;\n&quot;) i+=1

for循环

for 变量 in

import time name=&quot;dongGe&quot; for temp in name: print(&quot;%s&quot;%temp) time.sleep(1)

vim for.py +8 调到第8行

a=range(10) a_list=[x*x for x in a] #for表达式可以的使用

break和continue

break强制结束name=&quot;baoqiang&quot; for x in name: print(&quot;-----&quot;) if(x==&apos;i&apos;): break print(x)

name=&quot;baoqiang&quot; for x in name: print(&quot;-----&quot;) if(x==&apos;i&apos;): continue print(x)

continue 语句跳出本次循环，而break跳出整个循环。

字符串、列表、元组、字典

字符串string

字符串不能修改，对字符串进行大小写其实质是将原来的字符串覆盖%s

&gt;&gt;&gt; name=&apos;leejuny&apos; &gt;&gt;&gt; name[1]

切片语法：str[起始:结束:步长]，左封右开 &gt;&gt;&gt; name=&apos;baoqiang&apos; &gt;&gt;&gt; print(name[0:6:2]) name[-3:-1] # name[::-1] #逆序

# 查找到的第一个元素，返回下标号 print(str1.find(&apos;or&apos;)) # 8 print(str4.isdigit()) # 是否为数字 print(str4.replace(&apos;city&apos;,&apos;城市&apos;)) # 替换内容，也可以部分替换 print(str4.count(&apos;s&apos;)) # 统计元素数目

分离字符串 string = &quot;www.gziscas.com.cn&quot; 1.以&apos;.&apos;为分隔符 print(string.split(&apos;.&apos;)) [&apos;www&apos;, &apos;gziscas&apos;, &apos;com&apos;, &apos;cn&apos;] 2.分割两次 print(string.split(&apos;.&apos;，2)) [&apos;www&apos;, &apos;gziscas&apos;, &apos;com.cn&apos;]

&gt;&gt;&gt; &apos;1,2,,3,&apos;.split(&apos;,&apos;) [&apos;1&apos;, &apos;2&apos;, &apos;&apos;, &apos;3&apos;, &apos;&apos;]

常用字符串大小相关函数： title():首字符改成大写 lower() upper()

删除空白 strip()：删除字符串前后的空白 lstrip() rstrip()

查找、替换等 find():查找指定子串在字符串出现的位置，没有则返回-1 index():查找指定子串在字符串出现的位置，没有则报错 replace():替换 startswith()、endswith():判断字符串是否以指定子串开头\结尾

列表list

列表可以增删改查，通过下标来查找[]

增

append() #在list的末尾添加一个元素

stu_name.append(&apos;杨月&apos;) #在list的末尾添加一个元素

insert() #指定位置添加元素

extend() #末位添加列表

L1+L2 #合并

L1*3 #乘

删

remove() #删除指定元素

stu.remove(&apos;222&apos;)

pop() #删除最后一个元素

pop(index) #删除指定下标的元素

del list / del list[index]

clear()#清空列表

改

list[index] = &apos;XXXX&apos;#修改指定下标位置的值

List.reverse() #反转

stu.sort(reverse = True) #排序降序

stu.sort() #排序默认升序

查

stu[0] #第一个元素

stu[-1] #倒数第一个元素

stu.count(&apos;jack&apos;) #查询某个元素在list里面出现的次数

stu.index(&apos;jack&apos;) #查询指定元素的下标

list()函数可用于将元组、range等对象转换成列表。 tuple()函数可用于将列表、range等对象转换成元组。

列表遍历例子

filenames=[&quot;01.py&quot;,&quot;02.txt&quot;,&quot;03.rar&quot;,&quot;04.cpp&quot;,&quot;05.cpp&quot;,&quot;06.doc&quot;] #用for循环： for temp in filenames: position=temp.rfind(&apos;.&apos;) print(temp[position+1:]) #用while循环： i=0 while i&lt;len(filenames): temp=filenames[i] position=temp.rfind(&apos;.&apos;) print(temp[position+1:]) i+=1

#三个办公室随机分配8位老师，且每个办公室不能少于2个老师 import random offices=[[],[],[]] teachers=[&apos;1&apos;,&apos;2&apos;,&apos;3&apos;,&apos;4&apos;,&apos;5&apos;,&apos;6&apos;,&apos;7&apos;,&apos;8&apos;] #随机添加 j=0 while j &lt;8: i=random.randint(0,2) if len(offices[i])&gt;2 : continue else: offices[i].append(teachers[j]) j+=1 print(offices)

元组tuple

元组是只读列表()：元组里的数据不可修改

字典dict

字典是无序的，通过key来查找，不通过下标{} 键一般是唯一的，如果重复最后的一个键值对会替换前面的，值不需要唯一。

newNames={&apos;name1&apos;: &apos;aaa&apos;, &apos;name2&apos;: &apos;bbb&apos;, &apos;name3&apos;: &apos;ccc&apos;} &gt;&gt;&gt; newNames[&apos;name3&apos;]=&apos;CCC&apos; #可以是新增或修改 newNames[&apos;name4&apos;] #会报错 ,应该使用newNames.get(&apos;name4&apos;) newNames.get(&apos;name4&apos;,100) #默认值100

&gt;&gt;&gt; for i,j in info.items(): ... print(i,j) items()、keys()、values() 分别用于获取字典中的所有 key-value 对、所有 key、所有 value

常见操作

修改

dict[&apos;Age&apos;] = 8;

添加

newNames[&apos;name3&apos;]=&apos;CCC&apos; #可以是新增或修改

删除

del clear:del newNames ... newNames.clear() pop() ...newNames.pop(&apos;A&apos;)

get()根据key获取value update()根据key更新value

遍历

in info.items(): ... print(i)

&gt;&gt;&gt; for x,y in info.items(): ... print(&quot;x=%s,y=%s&quot;%(x,y)) #获取key、value值

常用工具函数

zip()把两个列表压缩成一个zip对象（可迭代对象），这样就可以使用一个循环遍历两个列表. books=[&apos;一千零一夜&apos;,&apos;白夜最凶&apos;,&apos;射雕&apos;] prices=[79,69,89] for book,price in zip(books,prices): print(&apos;%s价格是：%5.2f&apos;%(book,price))

reversed()可接收各种序列（元组、列表、区间等）参数，返回一个逆序列的迭代器. x for x in reversed(range(10))

sorted()返回一个新的、排序号的列表 sorted(a,reverse=True) #逆序 sorted(a) #正序

函数

函数定义、调用

def fun():#定义一个函数，后面是函数名 print(&quot;Hello World&quot;)#函数体 fun() #函数调用

函数参数

def calc(x,y):#定义一个函数，参数有x和y，x和y就是形参 print(x*y)#输出x乘以y的值 calc(5,2)#调用上面定义的函数，5和2就是实参 calc(y=5,x=2)

位置参数

# name，sex为位置参数/必填参数 def my(name,sex): print(name,sex) return name my(&apos;wwww&apos;,&apos;男&apos;)

默认参数[缺省参数]

# port=3306为默认值参数 def connect(ip,port=3306): print(ip,port) #如果给一个port值，则传新给的值 connect(&apos;118.1.1.1&apos;,3307) #如果不填，则使用默认参数 connect(&apos;118.1.1.1&apos;)

可变参数

#实例：发送报警短信参数前面加*代表参数组 def send_sms(*phone_num): #方法1，返回的是元祖 print(phone_num) #方法2，用下面循环的方法，不打印整个元祖，而是打印每一个元素 # for p in phone_num: # print(p) send_sms()# 不传参数 send_sms(150)# 传1个 send_sms(151,152,153)# 传N个返回的是元组

&gt;&gt;&gt; def test2(*n): sum=0 for x in n: sum+=x return sum

关键字参数

#关键字参数使用**来接收，返回的是字典 def send_sms2(**phone_num): print(phone_num) send_sms2() send_sms2(name=&apos;xiaohei&apos;,sex=&apos;nan&apos;) send_sms2(addr=&apos;北京&apos;,country=&apos;中国&apos;,aa=&apos;hahaha&apos;)

&gt;&gt;&gt; test3(1,22,23,age=&apos;lijun&apos;)

函数返回值return

def add3num(a,b,c): sum=a+b+c return sum result=add3num(11,22,33) print(&quot;%d&quot;%result)

return x,y,z#返回一个元组

函数嵌套调用

https://blog.csdn.net/title71/article/details/80464427

局部、全局变量

全局变量如果要在函数中修改的话，需要加global关键字声明，如果是list、字典的话，则不需要加global关键字，直接就可以修改。

递归函数

def test1(num): if num&gt;1: result=num+test1(num-1) else: result=1 return result result=test1(10) print(result)

匿名函数

引用

数据的在内存中的地址就是数据的引用。如果两个变量为同一个引用，那么这两个变量对应的数据一定相同；如果两个变量对应的数据相同，引用不一定相同。通过id(数据）可以查看数据对应的地址，修改变量的值，其实是在修改变量的引用。

数字独自占空间

可变类型:

如果修改了数据的内容,数据的地址没有发生改变.有列表,字典

不可变类型:

如果修改了数据的内容,数据的地址发生改变.有字符串,元组,数字

a+=a是在原数据上进行修改，a=a+a是先定义了一个变量，再取值

面向对象

类和对象

类是对某一类事物的抽象描述，是一种抽象的数据类型，一种模板。而对象用于表示现实中该类事物的个体，也就是具体化了类的描述。它们的关系是，对象是类的具体实例，类是对象的模板。对象根据类创建，一个类可以创建多个对象。

类的构成

使用class关键字定义一个类，类的主体由属性（变量）和方法（函数）组成。

定义类

#使用class创建一个School类,类中有个student方法 class School: # 类名首字母大写！！ def student(self): pass a1=School()

创建对象

#定义一个类 class Dog: def __init__(self): #默认方法 self.weight=5 self.color=&apos;黄色&apos; #定义一个方法 def sleep(self): print(&quot;www...&quot;) def __str__(self): return &apos;XXXX&apos; #创建一只小狗 xiaogou=Dog() #调用小狗这个对象的一个方法 print(xiaogou)

__init__(self)方法

在创建对象是自动执行

理解self

__str__(self)方法

一般用来测试

隐藏数据

1、直接通过对象名修改 SweetPotato.cookedLevel = 5

2、通过方法间接修改 SweetPotato.cook(5) 推荐使用

应用：烤地瓜

class SweetPotato: #初始化，用来设置默认的属性 def __init__(self): self.cookedLevel=0 self.cookedString=&apos;生的&apos; self.condiments=[] #定制print打印这个对象的时候显示的内容 def __str__(self): #msg=&quot;您的地瓜已经处于XXX状态，添加的作料为YYY&quot; msg = &quot;您的地瓜已经处于 &quot; + self.cookedString + &quot; 状态&quot; if len(self.condiments)&gt;0: msg+=&quot;,添加的作料为:&quot; for temp in self.condiments: msg=msg+temp+&quot;,&quot; msg=msg.strip(&quot;,&quot;) #切掉最后一个，号 return msg #烤地瓜 def cook(self,time): self.cookedLevel+=time if self.cookedLevel&gt;8: self.cookedString = &apos;烤糊了&apos; elif self.cookedLevel&gt;5: self.cookedString = &apos;熟了&apos; elif self.cookedLevel&gt;3: self.cookedString = &apos;半生不熟&apos; else: self.cookedString = &apos;生的&apos; #添加作料 def addCondiments(self,temp): self.condiments.append(temp) #创建一个地瓜对象 digua =SweetPotato() print(digua) print(&quot;---烤了2分钟---&quot;) digua.cook(2) print(digua) print(&quot;---又烤了2分钟---&quot;) digua.cook(2) print(digua) digua.cook(2) print(&quot;---添加番茄酱---&quot;) digua.addCondiments(&quot;番茄酱&quot;) digua.addCondiments(&quot;蓝德&quot;) print(digua)

应用：存放家具

# 定义一个home家类 class Home: def __init__(self,area): self.area = area self.accommondateItem=[] #容纳物品 #灯的状态为off self.light=&apos;off&apos; def __str__(self): msg=&quot;家当前可用面积为：&quot;+str(self.area)+&quot;;灯的状态为：&quot;+self.light if len(self.accommondateItem)&gt;0: msg+=&quot;;当前有：&quot; #因为在append的时候，添加的是对象的引用，name此时的temp就是对象的应用，可以理解就是一个对象 for temp in self.accommondateItem: msg+=temp.getBedName()+&quot;,&quot; msg=msg.strip(&quot;,&quot;) return msg def containItem(self,item): #传入bed类 bedArea=item.getBedArea() #接口，获取数据，不能修改！ if self.area&gt;bedArea: self.accommondateItem.append(item) self.area-=bedArea print(&quot;添加%s成功..可用面积为%d&quot;%(item.getBedName(),self.area)) else: print(&quot;error&quot;) def turnOn(self): self.light=&quot;on&quot; #把家里的所有的家具，都变成“亮”状态 for temp in self.accommondateItem: #调用这个方法，用来修改这个物品的“亮”状态 temp.setLight() # 定义一个 bed床类 class Bed: def __init__(self,name,area): self.area =area self.name =name self.light = &quot;off&quot; def __str__(self): msg =self.name+ &quot;床占用的面积为：&quot; + str(self.area)+&quot;;当前的明暗程度为：&quot;+self.light return msg def getBedArea(self): return self.area def getBedName(self): return self.name def setLight(self): self.light=&quot;on&quot; def setLightOff(self): self.light = &quot;off&quot; home=Home(180) bed=Bed(&quot;席梦思床&quot;,4) bed2=Bed(&quot;木板床&quot;,10) home.turnOn() print(home) print(bed) print(bed2)

#如果有(object)叫新式类 #原来的那种没有的，叫经典类 class Person(object): def __init__(self,name,age): self.name=name self.age=age

私有属性

#如果有(object)叫新式类 #原来的那种没有的，叫经典类 class Person(object): def __init__(self,name,age): self.__name=name #私有属性 self.__age=age def __str__(self): return &quot;年龄：&quot;+str(self.__age) def setNewAge(self,newAge): if newAge&gt;0 and newAge&lt;80: self.__age=newAge xiaoming=Person(&quot;小明&quot;,19) xiaoming.setNewAge(120) print(xiaoming)

__del__

当对象在内存中被释放时，自动触发执行

class Animal(object): def __init__(self,name): self.__name=name def __del__(self): print(&quot;---啊---&quot;) dog=Animal(&quot;旺财&quot;) dog1=dog dog2=dog print(&quot;----1----&quot;) del dog del dog1 del dog2 print(&quot;----2----&quot;)

继承

class Animal(object): def __init__(self,name=&apos;动物&apos;,color=&apos;白色&apos;): self.name=name #公有属性会被继承 self.color=color class Dog(Animal): #继承 def printInfo(self): print(&quot;名字是：%s&quot; % self.name) print(&quot;颜色是：%s&quot;%self.color) wangcai=Dog(name=&quot;旺财&quot;) wangcai.printInfo()

通过继承来的方法访问父类的私有属性是可以的在子类中自定义的方法是不能访问父类的私有属性

重写父类方法与调用父类方法

class Animal(object): def bark(self): print(&quot;aaa&quot;) class Cat(Animal): def bark(self): #重写 #调用父类的这个方法bark【两种方法】 Animal.bark(self) super().bark() print(&quot;喵喵。。&quot;) tom=Cat() tom.bark()

多继承

class A(object): def testA(self): print(&quot;---- A test-------&quot;) class B(object): def testB(self): print(&quot;---- B test-------&quot;) class C(A,B): pass c=C() c.testA() c.testB()

python默认广度遍历

class Base(object): def test(self): print(&quot;---- Base test-------&quot;) class A(Base): def testA(self): print(&quot;---- A test-------&quot;) #def test(self): #print(&quot;---- A test-------&quot;) class B(Base): def testB(self): print(&quot;---- B test-------&quot;) def test(self): print(&quot;---- B test-------&quot;) class C(A,B): pass c=C() c.test() #结果：---- B test-------

多态

class Animal(object): def bark(self): print(&quot;啊啊啊...&quot;) class Cat(Animal): def bark(self): print(&quot;喵喵喵...&quot;) class Dog(Animal): def bark(self): print(&quot;汪汪汪...&quot;) class Robot(object): def bark(self): print(&quot;嗡嗡嗡...&quot;) #多态，调用的方法是同一个，但是执行的代码不一样 def animalBark(temp): temp.bark() miaomi=Cat() wangcai=Dog() animalBark(miaomi) animalBark(wangcai) dingdang=Robot() animalBark(dingdang) #结果：喵喵喵... 汪汪汪... 嗡嗡嗡...

类属性和实例属性

分支主题

异常

try..except...

#万能异常Exception s1 = &apos;hello&apos; try: int(s1) except Exception as e: print(e)

try: print(&apos;...test...1..&apos;) open(&apos;123.txt&apos;,&apos;r&apos;) print(&apos;...test...2..&apos;) except IOError: print(&quot;哈哈哈&quot;)

try: print(num) except (IOError,NameError): print(&quot;哈哈哈&quot;) #多个异常

try: print(num) except NameError as e: print(e) else: print(&quot;没有捕获到异常&quot;) finally: print(&quot;我一定会执行的哦&quot;)

try..finally...无论是否发生异常都将会执行最后的代码

IOError 输入/输出操作失败 NameError 未声明/初始化对象 (没有属性) SynataxError:语法错误 ValueError:值错误,传给对象的参数类型不正确 TypeError:类型错误 IndexError:索引错误

抛出异常raise

异常常用语法： try...except组合语法 try...except...else组合语法 try...except...except(多个异常处理) try...except(A,B)(一次捕获多个异常) try...except...finally组合语法:(try...finally也可以) try...except(嵌套)

文件

基本语法

分支主题

文本两种模式【文本，非文本b】只读(r, rb) #b二进制只写(w, wb) 追加(a, ab) r+读写 w+写读 a+写读(追加写读)

读文件

read() 一次读取文件所有内容，返回一个str read(size) 每次最多读取指定长度的内容，返回一个str；size指定的是字符长度，一个汉字占3个字节 readlines() 一次读取文件所有内容，按行返回一个list readline() 每次只读取一行内容

#方法1： with open(&apos;song.txt&apos;, &apos;r&apos;, encoding=&apos;utf-8&apos;) as f: for line in f.readlines(): print(line)

#方法2： f = open(&apos;C:\\Users\Administrator\Desktop\song.txt&apos;, &apos;r&apos;, encoding=&apos;utf-8&apos;)# 第一步：打开文件 print(f.read())# 第二步：读取文件内容 f.close()# 第三步：关闭文件

修改文件

方法1： 1、先把文件内容全部读取 2：在内存中修改 3：把修改好的内容覆盖写入到硬盘上

with open(&apos;def\wj.txt&apos;, &apos;r&apos;) as f: data = f.read() data=data.replace(&quot;222&quot;,&quot;===&quot;) # 修改文件内容 with open(&apos;def\wj.txt&apos;, &apos;w&apos;) as f: f.write(data) # 把修改后的数据写入

方法2： 1：以读的方式打开源文件 2：以写的方式打开一个新文件

import os # 引入os模块 with open(&apos;a.txt&apos;, &apos;r&apos;, encoding=&apos;utf-8&apos;) as read_f, \ open(&apos;new.txt&apos;, &apos;w&apos;, encoding=&apos;utf-8&apos;) as new_f: # 同时打开文件 for line in read_f: # 循环原文件内容 if &apos;你好啊&apos; in line: line = line.replace(&apos;你好啊&apos;, &apos;哈哈哈哈哈哈&apos;) # 替换源文件内容 new_f.write(line) # 把原文件循环出来的内容写入到新文件中， os.remove(&apos;a.txt&apos;) # 调用OS模块功能删除原文件 os.rename(&apos;new.txt&apos;, &apos;a.txt&apos;) # 重命名新文件

相对路径

../ 表示当前文件所在的目录的上一级目录 ./ 表示当前文件所在的目录(可以省略) / 表示当前站点的根目录(域名映射的硬盘目录)

正则表达式re

简介

主要是对字符串的一种过滤，用“元字符” 与“普通字符”组成一个字符串规则对已知的字符串或文本过滤出自己想要的字符串。

元字符

分支主题

?! “不包含”

库

1、re.match

re.match 尝试从字符串的起始位置匹配一个模式，如果不是起始位置匹配成功的话，match()就返回none。语法：re.match(pattern, string, flags=0) 参数说明： pattern 匹配的正则表达式 string 要匹配的字符串。 flags 标志位，用于控制正则表达式的匹配方式，如：是否区分大小写，多行匹配等等。

2、re.search

re.search 扫描整个字符串并返回第一个成功的匹配。语法：re.search(pattern, string, flags=0)

3、re.sub

Python 的 re 模块提供了re.sub用于替换字符串中的匹配项。语法：re.sub(pattern, repl, string, count=0, flags=0) 参数： pattern : 正则中的模式字符串。 repl : 替换的字符串，也可为一个函数。 count : 模式匹配后替换的最大次数，默认 0 表示替换所有的匹配。

4、re.compile

compile 函数用于编译正则表达式，生成一个正则表达式（ Pattern ）对象，供 match() 和 search() 这两个函数使用。语法：re.compile(pattern[, flags]) 参数： pattern : 一个字符串形式的正则表达式 flags : 可选，表示匹配模式，比如忽略大小写，多行模式等，具体参数为： re.I 忽略大小写 re.L 表示特殊字符集 \w, \W, \b, \B, \s, \S 依赖于当前环境 re.M 多行模式 re.S 即为 . 并且包括换行符在内的任意字符（. 不包括换行符） re.U 表示特殊字符集 \w, \W, \b, \B, \d, \D, \s, \S 依赖于 Unicode 字符属性数据库 re.X 为了增加可读性，忽略空格和 # 后面的注释

邮箱正则[A-Za-z0-9\._+]+@[A-Za-z]+\.(com|org|edu|net)

HTML、CSS基础语法

html基本结构

#html基本结构： &lt;!DOCTYPE html&gt; &lt;html lang=&quot;en&quot;&gt; &lt;head&gt; &lt;meta charset=&quot;UTF-8&quot;&gt; &lt;title&gt;Title&lt;/title&gt; &lt;/head&gt; &lt;body&gt; &lt;/body&gt; &lt;/html&gt;

head

头部区域的元素标签为: &lt;title&gt;, &lt;style&gt;, &lt;meta&gt;, &lt;link&gt;, &lt;script&gt;, &lt;noscript&gt;, and &lt;base&gt;. 在 &lt;head&gt;元素中你可以插入脚本（scripts）, 样式文件（CSS），及各种meta信息。

&lt;base&gt; 标签

档中所有的链接标签的默认链接

&lt;base href=&quot;http://www.runoob.com/images/&quot; target=&quot;_blank&quot;&gt;

&lt;link&gt;

通常用于链接到样式表

&lt;link rel=&quot;stylesheet&quot; type=&quot;text/css&quot; href=&quot;mystyle.css&quot;&gt;

&lt;style&gt; 标签

定义了HTML文档的样式文件引用地址

&lt;head&gt; &lt;style type=&quot;text/css&quot;&gt; body {background-color:yellow} p {color:blue} &lt;/style&gt; &lt;/head&gt;

&lt;meta&gt; 标签

提供了元数据.元数据也不显示在页面上，但会被浏览器解析。

&lt;script&gt;标签用于加载脚本文件，如： JavaScript。

body

基础

html标题

通过&lt;h1&gt; - &lt;h6&gt; 标签来定义的.

段落 &lt;p&gt;

&lt;p&gt;这是一个段落。&lt;/p&gt;

链接&lt;a&gt;

&lt;a href=&quot;http://www.baidu.com&quot; target=&quot;_self&quot;&gt;百度一下&lt;/a&gt;

target=&quot;_self&quot;本页面 | target=&quot;_blank&quot; 新页面

title 给链接添加提示文字 name 链接命名 id 链接id名

在当前页面链接到指定位置 &lt;a href=&quot;#名字&quot;&gt;和要链接到的位置&lt;a id=&quot;名字&quot;&gt; 它的作用是使用书签是可以跳转到需要的地方。

图像img

&lt;img src=&quot;/images/logo.png&quot; width=&quot;258&quot; height=&quot;39&quot; /&gt;

src 属性定义图片的引用地址 alt 属性定义图片加载失败时显示的文字 &lt;img src=”images/pic.jpg” alt=”产品图片” /&gt;

路径：相对和绝对路径

注释&lt;! -- --&gt;

按住ctrl+/键

换行&lt;br&gt;

水平线&lt;hr&gt;

空格&nbsp;

属性

属性描述 class 为html元素定义一个或多个类名（classname）(类名从样式文件引入) id 定义元素的唯一id style 规定元素的行内样式（inline style） title 描述了元素的额外信息 (作为工具条使用)

文本格式化

&lt;b&gt;加粗文本&lt;/b&gt; &lt;i&gt;斜体文本&lt;/i&gt; &lt;code&gt;电脑自动输出&lt;/code&gt; 这是 &lt;sub&gt; 下标&lt;/sub&gt; 和 &lt;sup&gt; 上标&lt;/sup&gt;

列表

&lt;ol&gt; 定义有序列表 &lt;ul&gt; 定义无序列表 &lt;li&gt; 定义列表项 &lt;dl&gt; 自定义列表 &lt;dt&gt; 自定义列表项目 &lt;dd&gt; 定义自定列表项的描述

表格

每个表格均有若干行（由 &lt;tr&gt; 标签定义），每行被分割为若干单元格（由 &lt;td&gt; 标签定义）

边框属性

&lt;table border=&quot;1&quot;&gt; &lt;tr&gt; &lt;td&gt;Row 1, cell 1&lt;/td&gt; &lt;td&gt;Row 1, cell 2&lt;/td&gt; &lt;/tr&gt; &lt;/table&gt;

表格的表头使用 &lt;th&gt; 标签进行定义。

&lt;table border=&quot;1&quot;&gt; &lt;tr&gt; &lt;th&gt;Header 1&lt;/th&gt; &lt;th&gt;Header 2&lt;/th&gt; &lt;/tr&gt; &lt;tr&gt; &lt;td&gt;row 1, cell 1&lt;/td&gt; &lt;td&gt;row 1, cell 2&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;td&gt;row 2, cell 1&lt;/td&gt; &lt;td&gt;row 2, cell 2&lt;/td&gt; &lt;/tr&gt; &lt;/table&gt;

表单

网络爬虫

requests库获取网页源代码

Requests 库就是这样一个擅长处理那些复杂的 HTTP 请求、 cookie、 header（响应头和请求头）等内容的 Python 第三方库。

link=&quot;http://movie.douban.com/top250/&quot; headers={&apos;User-Agent&apos;:&apos;Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36&apos;} r=requests.get(link,headers=headers,timeout=20).content

import requests from bs4 import BeautifulSoup from fake_useragent import UserAgent url = &apos;http://www.google.com&apos; headers = {&apos;User-Agent&apos;: &apos;Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36&apos;} response = requests.get(quote_page, headers=headers).text soup = BeautifulSoup(response, &apos;html.parser&apos;) print(soup.prettify())

requests里.text和.content方法的区别

text返回的是Unicode型的数据。content返回的是bytes型也就是二进制的数据文本类型用text，图片、文件类型用contexnt

提交一个基本表单可用get或者post

import requests params = {&apos;txtAccountName&apos;: &apos;lijun&apos;, &apos;txtAccountPwd&apos;: &apos;leejuny2011&apos;} r = requests.get(&quot;https://data.zqgame.com:5781/&quot;, data=params) r.encoding = r.apparent_encoding #解决乱码 print(r.text)

提交文件和图像

import requests files = {&apos;uploadFile&apos;: open(&apos;../files/Python-logo.png&apos;, &apos;rb&apos;)} r = requests.post(&quot;http://pythonscraping.com/pages/processing2.php&quot;,files=files) print(r.text)

处理登录和cookie

会话（session）对象（调用 requests.Session() 获取）会持续跟踪会话信息，像 cookie、 header，甚至包括运行 HTTP 协议的信息

import requests session = requests.Session() params = {&apos;username&apos;: &apos;username&apos;, &apos;password&apos;: &apos;password&apos;} s = session.post(&quot;http://pythonscraping.com/pages/cookies/welcome.php&quot;, params) print(&quot;Cookie is set to:&quot;) print(s.cookies.get_dict()) print(&quot;-----------&quot;) print(&quot;Going to profile page...&quot;) s = session.get(&quot;http://pythonscraping.com/pages/cookies/profile.php&quot;) print(s.text)

HTTP基本接入认证

import requests from requests.auth import AuthBase from requests.auth import HTTPBasicAuth auth = HTTPBasicAuth(&apos;ryan&apos;, &apos;password&apos;) r = requests.post(url=&quot;http://pythonscraping.com/pages/auth/login.php&quot;, auth=auth) print(r.text)

BeautifulSoup HTML 解析库【静】

参考网站：https://blog.csdn.net/youzhouliu/article/details/58586230

python爬取网页中文乱码解决方案

r.encoding为服务器内容使用的文本编码 r.encoding = r.apparent_encoding # 解决中文乱码

分支主题

soup=BeautifulSoup(xml,&apos;lxml&apos;) tbody=soup.findAll(&apos;tr&apos;,class_=&apos;alt&apos;) for tr in tbody: tds = tr.find_all(&apos;td&apos;) title.append([tds[0].text,tds[1].text,tds[2].text,tds[3].text,tds[4].text])

类名查找或id查找： soup.select(&apos;.c-gap-left-small&apos;) soup.select(&apos;#content_bottom&apos;) 组合查找： soup.select(&apos;a .c-gap-left-small&apos;) 基于select获取：css选择器，写 CSS 时，标签名不加任何修饰，类名前加.，id名前加#；返回值是一个列表标签名查找：soup.select(&apos;h3 a&apos;)取h3标签下的a标签；等价于soup.select(&apos;h3 &gt; a&apos;)

findAll(tag, attributes, recursive, text, limit, keywords) find(tag, attributes, recursive, text, keywords) .findAll(&quot;span&quot;, {&quot;class&quot;:{&quot;green&quot;, &quot;red&quot;}}) findAll({&quot;h1&quot;,&quot;h2&quot;,&quot;h3&quot;,&quot;h4&quot;,&quot;h5&quot;,&quot;h6&quot;}) findAll(text=&quot;the prince&quot;) 查找文本中含有的

下面两行代码是完全一样的： bsObj.findAll(id=&quot;text&quot;) bsObj.findAll(&quot;&quot;, {&quot;id&quot;:&quot;text&quot;})

bsObj.findAll(class_=&quot;green&quot;) 另外，你也可以用属性参数把 class 用引号包起来： bsObj.findAll(&quot;&quot;, {&quot;class&quot;:&quot;green&quot;})

直接调用bsObj.div.h1

导航树

BeautifulSoup 库里，孩子（child）和后代（descendant）有显著的不同：和人类的家谱一样，子标签就是一个父标签的下一级，而后代标签是指一个父标签下面所有级别的标签。例如， tr 标签是 tabel 标签的子标签，而 tr、 th、 td、 img 和 span标签都是 tabel 标签的后代标签

只想找出子标签，可以用 .children 标签： from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen(&quot;http://www.pythonscraping.com/pages/page3.html&quot;) bsObj = BeautifulSoup(html) for child in bsObj.find(&quot;table&quot;,{&quot;id&quot;:&quot;giftList&quot;}).children: print(child)

处理兄弟标签next_siblings() 函数，那么它就只会返回在它后面的兄弟标签。因此，选择标签行然后调用 next_siblings，可以选择表格中除了标题行以外的所有行。previous_siblings for sibling in bsObj.find(&quot;table&quot;,{&quot;id&quot;:&quot;giftList&quot;}).tr.next_siblings: print(sibling)

print(bsObj.find(&quot;img&quot;,{&quot;src&quot;:&quot;../img/gifts/img1.jpg&quot; }).parent.previous_sibling.get_text())

正则表达式和BeautifulSoup

srclist=bsObj.find_all(&quot;img&quot;,{&quot;src&quot;:re.compile(&quot;\.\./img/gifts/img.*\.jpg&quot;)}) for sibling in srclist: print(sibling[&apos;src&apos;]) #打印出图片的相对路径

获取属性

一个标签对象，可以用下面的代码获取它的全部属性： myTag.attrs myImgTag.attrs[&quot;src&quot;] 获取单个属性比如标签 &lt;a&gt; 指向的 URL 链接包含在 href 属性中，或者 &lt;img&gt; 标签的图片文件包含在 src 属性中，这时获取标签属性就变得非常有用了

采集

遍历单个域名

from urllib.request import urlopen from bs4 import BeautifulSoup import re html = urlopen(&quot;http://en.wikipedia.org/wiki/Kevin_Bacon&quot;) bsObj = BeautifulSoup(html) for link in bsObj.find(&quot;div&quot;, {&quot;id&quot;:&quot;bodyContent&quot;}).findAll(&quot;a&quot;,href=re.compile(&quot;^(/wiki/)((?!:).)*$&quot;)): if &apos;href&apos; in link.attrs: print(link.attrs[&apos;href&apos;])

遍历所有域名

import requests from bs4 import BeautifulSoup import re pages = [] def getLinks(pageUrl): global pages html = requests.get(&quot;http://en.wikipedia.org&quot;+pageUrl) bsObj = BeautifulSoup(html.text,&quot;html.parser&quot;) for link in bsObj.findAll(&quot;a&quot;, href=re.compile(&quot;^(/wiki/)&quot;)): if &apos;href&apos; in link.attrs: if link.attrs[&apos;href&apos;] not in pages: # 我们遇到了新页面 newPage = link.attrs[&apos;href&apos;] print(newPage) pages.append(newPage) getLinks(newPage) getLinks(&quot;&quot;)

json库的使用

js范例

{ &quot;key1&quot;: &quot;value1&quot;, &quot;key2&quot;: [1,2,&quot;value2&quot;], &quot;key3&quot;: { &quot;key31&quot;: &quot;value1&quot;, &quot;key32&quot;: [1,2,&quot;value2&quot;], &quot;key33&quot;: true, }, }

import json jsonString = &apos;{&quot;arrayOfNums&quot;:[{&quot;number&quot;:0},{&quot;number&quot;:1},{&quot;number&quot;:2}], &quot;arrayOfFruits&quot;:[{&quot;fruit&quot;:&quot;apple&quot;},{&quot;fruit&quot;:&quot;banana&quot;},{&quot;fruit&quot;:&quot;pear&quot;}]}&apos; jsonObj = json.loads(jsonString) print(jsonObj.get(&quot;arrayOfNums&quot;)) print(jsonObj.get(&quot;arrayOfNums&quot;)[1]) print(jsonObj.get(&quot;arrayOfNums&quot;)[1].get(&quot;number&quot;)+ jsonObj.get(&quot;arrayOfNums&quot;)[2].get(&quot;number&quot;)) print(jsonObj.get(&quot;arrayOfFruits&quot;)[2].get(&quot;fruit&quot;))

json库一共有三个方法，分别是 dump、dumps、load、loads。

其中 dump和 dumps是用来把把字典和数组转换为 json格式的，dump把转换结果直接写入文件，dumps返回字符串。

load和 loads是把 json格式的数据转换为字典格式，load直接从 json文件中读取数据并返回字典对象，loads把字符串形式的 json数据转换成字典格式。

存储数据

媒体文件

import requests from urllib.request import urlretrieve from bs4 import BeautifulSoup import json html = requests.get(&quot;http://www.pythonscraping.com&quot;) bsObj = BeautifulSoup(html.text,&quot;html.parser&quot;) imageLocation = bsObj.find(&quot;a&quot;, {&quot;id&quot;: &quot;logo&quot;}).find(&quot;img&quot;)[&quot;src&quot;] urlretrieve(imageLocation,&quot;logo.jpg&quot;)

csv

import csv csvFile = open(&quot;./files/test.csv&quot;, &apos;w+&apos;) try: writer = csv.writer(csvFile) writer.writerow((&apos;number&apos;, &apos;number plus 2&apos;, &apos;number times 2&apos;)) for i in range(10): writer.writerow( (i, i+2, i*2)) finally: csvFile.close()

python读写、追加csv方法： ‘r’：只读（缺省。如果文件不存在，则抛出错误） ‘w’：只写（如果文件不存在，则自动创建文件） ‘a’：附加到文件末尾（如果文件不存在，则自动创建文件） ‘r+’：读写（如果文件不存在，则抛出错误）

mysql

import pymysql conn = pymysql.connect(host=&apos;127.0.0.1&apos;, unix_socket=&apos;/tmp/mysql.sock&apos;,user=&apos;root&apos;, passwd=None, db=&apos;mysql&apos;) cur = conn.cursor() cur.execute(&quot;USE scraping&quot;) cur.execute(&quot;SELECT * FROM pages WHERE id=1&quot;) print(cur.fetchone()) cur.close() conn.close()

def store(checi, content): cur.execute(&quot;INSERT INTO lieche(checi,content) VALUES (\&quot;%s\&quot;,\&quot;%s\&quot;)&quot;, (checi, content)) cur.connection.commit()

selenium

selenium爬虫中主要用来解决JavaScript渲染问题。 AJAX 在后台与服务器进行少量数据交换可以使网页实现异步更新。

【需要安装selenium及ChromeDriver插件】安装ChromeDriver, 该工具供selenium使用Chrome. ChromeDriver: http://npm.taobao.org/mirrors/chromedriver/ 【本机chrome版本是V72,对应的chromedriver版本是2.39】将解压后的文件放入配置了环境变量的文件夹, 如python的文件夹.

【网页用到框架处理方法】如果iframe有name或id的话，直接使用switch_to_frame(&quot;name值&quot;)或switch_to_frame(&quot;id值&quot;)

【1、获得驱动】

from selenium import webdriver driver = webdriver.Chrome() #C大写 driver.get(&quot;https://www.dianping.com/search/category/7/10/p1&quot;)

【2、元素定位】

id定位：find_element_by_id() name定位：find_element_by_name() class定位：find_element_by_class_name() link定位：find_element_by_link_text() partial link定位：find_element_by_partial_link_text() tag定位：find_element_by_tag_name() xpath定位：find_element_by_xpath() css定位：find_element_by_css_selector()

#coding=utf-8 from selenium import webdriver browser=webdriver.Firefox() browser.get(&quot;http://www.baidu.com&quot;) #########百度输入框的定位方式########## #通过id方式定位 browser.find_element_by_id(&quot;kw&quot;).send_keys(&quot;selenium&quot;) #通过name方式定位 browser.find_element_by_name(&quot;wd&quot;).send_keys(&quot;selenium&quot;) #通过tag name方式定位 browser.find_element_by_tag_name(&quot;input&quot;).send_keys(&quot;selenium&quot;) #通过class name方式定位 browser.find_element_by_class_name(&quot;s_ipt&quot;).send_keys(&quot;selenium&quot;) #通过CSS方式定位 browser.find_element_by_css_selector(&quot;#kw&quot;).send_keys(&quot;selenium&quot;) #通过xpath方式定位 browser.find_element_by_xpath(&quot;//input[@id=&apos;kw&apos;]&quot;).send_keys(&quot;selenium&quot;) ############################################ browser.find_element_by_id(&quot;su&quot;).click() time.sleep(3) browser.quit()

【3、元素事件】

Webelement操作常用方法 clear(): 清空对象中的内容. click(): 单击对象. get_attribute(name): 优先返回完全匹配属性名的值，如果不存在，则返回属性名中包含name的值。 screenshot(filename): 获取当前元素的截图，保存为png，最好用绝对路径. send_keys(value): 给对象元素输入数据, 如在百度中搜索’哔哩哔哩’. submit(): 提交表单.

【4、对象操作】

Driver对象常见操作 get(url): 在当前浏览器会话中访问传入的url地址, driver.get(&apos;https://www.baidu.com&apos;). close(): 关闭浏览器当前窗口。 quit(): 退出webdriver并关闭所有窗口。 refresh(): 刷新当前页面。 title: 获取当前页的标题。 page_source: 获取当前页渲染后的源代码。 current_url: 获取当前页面的url。 window_handles: 获取当前会话中所有窗口的句柄。

参考地址：https://blog.csdn.net/One_of_them/article/details/82560880 https://www.cnblogs.com/new-june/p/9599331.html https://www.jianshu.com/p/1531e12f8852

启动浏览器

selinium两种启动方式

普通方式启动

from selenium import webdriver browser = webdriver.Chrome() browser.get(&apos;http://www.baidu.com/&apos;)

Headless方式启动

from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.action_chains import ActionChains from selenium.webdriver.common.keys import Keys chrome_options = webdriver.ChromeOptions() # 使用headless无界面浏览器模式 chrome_options.add_argument(&apos;--headless&apos;) #增加无界面选项 chrome_options.add_argument(&apos;--disable-gpu&apos;) #如果不加这个选项，有时定位会出现问题 # 启动浏览器，获取网页源代码 browser = webdriver.Chrome(chrome_options=chrome_options) mainUrl = &quot;https://www.taobao.com/&quot; browser.get(mainUrl) print(f&quot;browser text = {browser.page_source}&quot;) browser.quit()

加载配置启动浏览器

Selenium操作浏览器是不加载任何配置的用Chrome地址栏输入chrome://version/，查看自己的“个人资料路径”，然后在浏览器启动时，调用这个配置文件

#coding=utf-8 from selenium import webdriver option = webdriver.ChromeOptions() option.add_argument(r&apos;--user-data-dir=C:\Users\leeju\AppData\Local\Google\Chrome\User Data\Default&apos;) #设置成用户自己的数据目录 driver=webdriver.Chrome(chrome_options=option)

Django

简介

MVC

m:model,主要用于对数据库层的封装

v:view,用于向用户展示结果

c:controller,是核心。用于处理请求、获取数据、返回结果

MVC框架的核心思想是：解耦

MVT

m:model,负责与数据库交互

v:view,是核心。用于处理请求、获取数据、返回结果

t:template,负责呈现内容到浏览器

Djiango属于MVT框架

参考地址http://www.liujiangblog.com/course/django/85

环境搭建

各文件和目录解释：外层的mysite/目录与Django无关，只是你项目的容器，可以任意重命名。 manage.py：一个命令行工具，用于与Django进行不同方式的交互脚本，非常重要！内层的mysite/目录是真正的项目文件包裹目录，它的名字是你引用内部文件的包名，例如：mysite.urls。 mysite/__init__.py:一个定义包的空文件。 mysite/settings.py:项目的主配置文件，非常重要！ mysite/urls.py:路由文件，所有的任务都是从这里开始分配，相当于Django驱动站点的内容表格，非常重要！ mysite/wsgi.py:一个基于WSGI的web服务器进入点，提供底层的网络通信功能，通常不用关心。

分支主题

定义模型

使用后台管理

编写视图

定义模板