笨办法学python3目录（如何愉快地迁移到 Python 3）

时间：2021-10-12 00:36:19类别：脚本大全

笨办法学python3目录

如何愉快地迁移到 Python 3

引言

如今 python 成为机器学习和大量使用数据操作的科学领域的主流语言; 它拥有各种深度学习框架和完善的数据处理和可视化工具。但是，python 生态系统在 python2 和 python3 中共存，而python2 仍在数据科学家中使用。到2019年底，也将停止支持 python2。至于numpy，2018年9月之后任何新功能版本都将只支持python3。同样的还包括pandas, matplotlib, ipython, jupyter notebook and jupyter lab。所以迁移到python3刻不容缓，当然不止是这些，还有些新特性让我们跟随后面到文章一一进行了解。

使用pathlib处理更好的路径

pathlib 是 python3 中的一个默认模块，可以帮助你避免使用大量的 os.path.join。

1

2

3

4

5

6

7

8

9 from pathlib import path

dataset = 'wiki_images'

datasets_root = path('/path/to/datasets/')

#navigating inside a directory tree,use:/

train_path = datasets_root / dataset / 'train'

test_path = datasets_root / dataset / 'test'

for image_path in train_path.iterdir():

with image_path.open() as f: # note, open is a method of path object

# do something with an image

不要用字符串链接的形式拼接路径，根据操作系统的不同会出现错误，我们可以使用/结合 pathlib来拼接路径，非常的安全、方便和高可读性。

pathlib 还有很多属性，具体的可以参考pathlib的官方文档，下面列举几个：

1

2

3

4

5

6

7

8

9

10

11

12 from pathlib import path

a = path("/data")

b = "test"

c = a / b

print(c)

print(c.exists()) # 路径是否存在

print(c.is_dir()) # 判断是否为文件夹

print(c.parts) # 分离路径

print(c.with_name('sibling.png')) # 只修改拓展名, 不会修改源文件

print(c.with_suffix('.jpg')) # 只修改拓展名, 不会修改源文件

c.chmod(777) # 修改目录权限

c.rmdir() # 删除目录

类型提示现在是语言的一部分

一个在 pycharm 使用typing的例子:

引入类型提示是为了帮助解决程序日益复杂的问题，ide可以识别参数的类型进而给用户提示。

关于tying的具体用法，可以看我之前写的:python类型检测最终指南--typing的使用

运行时类型提示类型检查

除了之前文章提到 mypy 模块继续类型检查以外，还可以使用 enforce 模块进行检查，通过 pip 安装即可，使用示例如下:

1

2

3

4

5

6 import enforce

@enforce.runtime_validation

def foo(text: str) -> none:

print(text)

foo('hi') # ok

foo(5) # fails

输出

1

2

3

4

5

6

7

8

9

10

11 hi

traceback (most recent call last):

file "/users/chennan/pythonproject/dataanalysis/e.py", line 10, in <module>

foo(5) # fails

file "/users/chennan/desktop/2019/env/lib/python3.6/site-packages/enforce/decorators.py", line 104, in universal

_args, _kwargs, _ = enforcer.validate_inputs(parameters)

file "/users/chennan/desktop/2019/env/lib/python3.6/site-packages/enforce/enforcers.py", line 86, in validate_inputs

raise runtimetypeerror(exception_text)

enforce.exceptions.runtimetypeerror:

the following runtime type errors were encountered:

argument 'text' was not of type <class 'str'>. actual type was int.

使用@表示矩阵的乘法

下面我们实现一个最简单的ml模型——l2正则化线性回归(又称岭回归)

1

2

3

4

5 # l2-regularized linear regression: || ax - y ||^2 + alpha * ||x||^2 -> min

# python 2

x = np.linalg.inv(np.dot(a.t, a) + alpha * np.eye(a.shape[1])).dot(a.t.dot(y))

# python 3

x = np.linalg.inv(a.t @ a + alpha * np.eye(a.shape[1])) @ (a.t @ y)

使用@符号，整个代码变得更可读和方便移植到其他科学计算相关的库，如numpy, cupy, pytorch, tensorflow等。

**通配符的使用

在 python2 中，递归查找文件不是件容易的事情，即使是使用glob库，但是从 python3.5 开始，可以通过**通配符简单的实现。

1

2

3

4

5

6

7

8

9

10 import glob

# python 2

found_images = (

glob.glob('/path/*.jpg')

+ glob.glob('/path/*/*.jpg')

+ glob.glob('/path/*/*/*.jpg')

+ glob.glob('/path/*/*/*/*.jpg')

+ glob.glob('/path/*/*/*/*/*.jpg'))

# python 3

found_images = glob.glob('/path/**/*.jpg', recursive=true)

更好的路径写法是上面提到的 pathlib ，我们可以把代码进一步改写成如下形式。

1

2

3

4 # python 3

import pathlib

import glob

found_images = pathlib.path('/path/').glob('**/*.jpg')

print函数

虽然 python3 的 print 加了一对括号，但是这并不影响它的优点。

使用文件描述符的形式将文件写入

1 2	`print` `>>sys.stderr,` `"critical error"` `# python 2` `print("critical error",` `file=sys.stderr)` `# python 3`

不使用 str.join 拼接字符串

1

2

3 # python 3

print(*array, sep=' ')

print(batch, epoch, loss, accuracy, time, sep=' ')

重新定义 print 方法的行为

既然 python3 中的 print 是一个函数，我们就可以对其进行改写。

1

2

3

4 # python 3

_print = print # store the original print function

def print(*args, **kargs):

pass # do something useful, e.g. store output to some file

注意：在 jupyter 中，最好将每个输出记录到一个单独的文件中(跟踪断开连接后发生的情况)，这样就可以覆盖 print 了。

1

2

3

4

5

6

7

8

9

10 @contextlib.contextmanager

def replace_print():

import builtins

_print = print # saving old print function

# or use some other function here

builtins.print = lambda *args, **kwargs: _print('new printing', *args, **kwargs)

yield

builtins.print = _print

with replace_print():

<code here will invoke other print function>

虽然上面这段代码也能达到重写 print 函数的目的，但是不推荐使用。

print 可以参与列表理解和其他语言构造

1 2	`# python 3` `result` `=` `process(x)` `if` `is_valid(x)` `else` `print('invalid item: ', x)`

数字文字中的下划线(千位分隔符)

在 pep-515 中引入了在数字中加入下划线。在 python3 中，下划线可用于整数，浮点和复数，这个下划线起到一个分组的作用

1

2

3

4

5

6

7

8 # grouping decimal numbers by thousands

one_million = 1_000_000

# grouping hexadecimal addresses by words

addr = 0xcafe_f00d

# grouping bits into nibbles in a binary literal

flags = 0b_0011_1111_0100_1110

# same, for string conversions

flags = int('0b_1111_0000', 2)

也就是说10000,你可以写成10_000这种形式。

简单可看的字符串格式化f-string

python2提供的字符串格式化系统还是不够好，太冗长麻烦，通常我们会写这样一段代码来输出日志信息：

1

2

3

4

5

6

7

8

9

10

11 # python 2

print '{batch:3} {epoch:3} / {total_epochs:3} accuracy: {acc_mean:0.4f}±{acc_std:0.4f} time: {avg_time:3.2f}'.format(

batch=batch, epoch=epoch, total_epochs=total_epochs,

acc_mean=numpy.mean(accuracies), acc_std=numpy.std(accuracies),

avg_time=time / len(data_batch)

)

# python 2 (too error-prone during fast modifications, please avoid):

print '{:3} {:3} / {:3} accuracy: {:0.4f}±{:0.4f} time: {:3.2f}'.format(

batch, epoch, total_epochs, numpy.mean(accuracies), numpy.std(accuracies),

time / len(data_batch)

)

输出结果为

120 12 / 300 accuracy: 0.8180±0.4649 time: 56.60

在 python3.6 中引入了 f-string (格式化字符串)

1	`print(f'{batch:3} {epoch:3} / {total_epochs:3} accuracy: {numpy.mean(accuracies):0.4f}±{numpy.std(accuracies):0.4f} time: {time / len(data_batch):3.2f}')`

关于 f-string 的用法可以看我在b站的视频[https://www.bilibili.com/video/av31608754]

'/'和'//'在数学运算中有着明显的区别

对于数据科学来说，这无疑是一个方便的改变

1 2	`data` `=` `pandas.read_csv('timing.csv')` `velocity` `=` `data['distance']` `/` `data['time']`

python2 中的结果取决于“时间”和“距离”(例如，以米和秒为单位)是否存储为整数。在python3中，这两种情况下的结果都是正确的，因为除法的结果是浮点数。

另一个例子是 floor 除法，它现在是一个显式操作

1

2

3

4

5

6

7 n_gifts = money // gift_price # correct for int and float arguments

nutshell

>>> from operator import trueli, floorli

>>> trueli.__doc__, floorli.__doc__

('trueli(a, b) -- same as a / b.', 'floorli(a, b) -- same as a // b.')

>>> (3 / 2), (3 // 2), (3.0 // 2.0)

(1.5, 1, 1.0)

值得注意的是，这种规则既适用于内置类型，也适用于数据包提供的自定义类型（例如 numpy 或pandas）。

严格的顺序

下面的这些比较方式在 python3 中都属于合法的。

1

2

3

4 3 < '3'

2 < none

(3, 4) < (3, none)

(4, 5) < [4, 5]

对于下面这种不管是2还是3都是不合法的

(4, 5) == [4, 5]

如果对不同的类型进行排序

sorted([2, '1', 3])

虽然上面的写法在 python2 中会得到结果 [2, 3, '1']，但是在 python3 中上面的写法是不被允许的。

检查对象为 none 的合理方案

1

2

3

4

5

6

7

8 if a is not none:

pass

if a: # wrong check for none

pass

nlp unicode问题

s = '您好'

print(len(s))

print(s[:2])

输出内容

1 2	`python` `2:` `6` `python` `3:` `2`

您好.

还有下面的运算

1

2

3 x = u'со'

x += 'co' # ok

x += 'со' # fail

python2 失败了，python3 正常工作(因为我在字符串中使用了俄文字母)。

在 python3 中，字符串都是 unicode 编码，所以对于非英语文本处理起来更方便。

一些其他操作

1 2	`'a'` `<` `type` `< u'a'` `# python 2: true` `'a'` `< u'a'` `# python 2: false`

再比如

1 2	`from` `collections` `import` `counter` `counter('möbelstück')`

在 python2 中

counter({'ã': 2, 'b': 1, 'e': 1, 'c': 1, 'k': 1, 'm': 1, 'l': 1, 's': 1, 't': 1, '¶': 1, '¼': 1})

在 python3 中

counter({'m': 1, 'ö': 1, 'b': 1, 'e': 1, 'l': 1, 's': 1, 't': 1, 'ü': 1, 'c': 1, 'k': 1})

虽然可以在 python2 中正确地处理这些结果，但是在 python3 中看起来结果更加友好。

保留了字典和**kwargs的顺序

在cpython3.6+ 中，默认情况下，dict 的行为类似于 ordereddict ，都会自动排序（这在python3.7+ 中得到保证）。同时在字典生成式（以及其他操作，例如在 json 序列化/反序列化期间）都保留了顺序。

1

2

3

4

5

6

7 import json

x = {str(i):i for i in range(5)}

json.loads(json.dumps(x))

# python 2

{u'1': 1, u'0': 0, u'3': 3, u'2': 2, u'4': 4}

# python 3

{'0': 0, '1': 1, '2': 2, '3': 3, '4': 4}

这同样适用于**kwargs(在python 3.6+中)，它们的顺序与参数中出现的顺序相同。当涉及到数据管道时，顺序是至关重要的，以前我们必须以一种繁琐的方式编写它

1

2

3

4

5

6

7

8 from torch import nn

# python 2

model = nn.sequential(ordereddict([

('conv1', nn.conv2d(1,20,5)),

('relu1', nn.relu()),

('conv2', nn.conv2d(20,64,5)),

('relu2', nn.relu())

]))

而在 python3.6 以后你可以这么操作

标签：

迁移 Python3

笨办法学python3目录（如何愉快地迁移到 Python 3）