Python教程011 str类型 bytes类型 bytearray类型区别

str类型和bytes类型区别

大多数情况下，str上的操作均可用于bytes。然而，这里也有一些需要注意的不同点。

python3对文本和二进制数据做了区分。文本类型是Unicode编码，str类型，用于显示。二进制类型是bytes类型，用于存储和传输。bytes是byte的序列，而str是unicode的序列。str和bytes都是不可变类型。

str类型：

1
2
3
4
5


>>> s = '你好'
>>> s
'你好'
>>> type(s)
<class 'str'>

bytes类型：

1
2
3
4
5


>>> b = b'abc'
>>> b
b'abc'
>>> type(b)
<class 'bytes'>

str类型和bytes类型之间的转换关系

转换方式一：encode()，decode()
str–>encode()–>bytes–>decode()–>str

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


>>> a = '你好'
>>> b = a.encode('utf-8')
>>> b
b'\xe4\xbd\xa0\xe5\xa5\xbd'
>>> type(b)
<class 'bytes'>
>>> new_a = b.decode('utf-8')
>>> new_a
'你好'
>>> type(new_a)
<class 'str'>

转换方式二：bytes()，str()

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


>>> a = '你好'
>>> b = bytes(a, encoding='utf-8')
>>> b
b'\xe4\xbd\xa0\xe5\xa5\xbd'
>>> type(b)
<class 'bytes'>
>>> new_a = str(b, encoding='utf-8')
>>> new_a
'你好'
>>> type(new_a)
<class 'str'>

len函数 len()函数计算的是str的字符数，如果换成bytes，len()函数就计算字节数。
要注意区分’ABC’和b’ABC’，前者是str类型，后者是bytes类型。bytes类型的每个字符都只占用一个字节。

例如：

1
2
3
4
5
6
7


s = '中文'
b = s.encode('utf-8')
print(s)
print(len(s)) #str长度

print(b)
print(len(b)) #bytes长度

执行以上程序会输出如下结果：

1
2
3
4


中文
2
b'\xe4\xb8\xad\xe6\x96\x87'
6

bytes的索引操作返回整数而不是单独字符。比如：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


>>> a = 'Hello World' # 字符串对象
>>> a[0]
'H'
>>> a[1]
'e'
>>> b = b'Hello World' # 字节对象
>>> b[0]
72
>>> b[1]
101

bytes不会提供一个美观的字符串表示，也不能很好的打印出来，除非它们先被解码为一个文本字符串。

比如：

1
2
3
4
5
6
7
8


>>> a = 'Hello World' # 字符串对象
>>> print(a)
Hello World
>>> b = b'Hello World'
>>> print(b)
b'Hello World'
>>> print(b.decode('ascii'))
Hello World

类似的，也不存在任何适用于bytes的格式化操作：

1
2
3
4
5
6
7
8


>>> b'%10s %10d %10.2f' % (b'ACME', 100, 490.1)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for %: 'bytes' and 'tuple'
>>> b'{} {} {}'.format(b'ACME', 100, 490.1)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
AttributeError: 'bytes' object has no attribute 'format'

如果你想格式化bytes，你得先使用标准的str，然后将其编码为bytes。比如：

1
2


>>> '{:10s} {:10d} {:10.2f}'.format('ACME', 100, 490.1).encode('ascii')
b'ACME 100 490.10'

最后需要注意的是，使用bytes可能会改变一些操作的语义，特别是那些跟文件系统有关的操作。比如，如果你使用一个编码为字节的文件名，而不是一个普通的文本字符串，会禁用文件名的编码/解码。比如：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


>>> # Write a UTF-8 filename
>>> with open('jalape\xf1o.txt', 'w') as f:
 ... f.write('spicy')
 ...
>>> # Get a directory listing
>>> import os
>>> os.listdir('.') # Text string (names are decoded)
['jalapeño.txt']
>>> os.listdir(b'.') # Byte string (names left as bytes)
[b'jalapen\xcc\x83o.txt']

bytes类型和bytearray类型区别

bytes是不可变的，bytearray是可变的。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


# a是bytearray类型,b是bytes类型
a = bytearray("abc",encoding='utf-8')
b = b"abc"

print(a)
print(b)

print(a[0:1])
print(b[0:1])

print(type(a[0:1]))  # 输出a[0:1]的类型，<class 'bytearray'>
print(type(b[0:1]))  # 输出b[0:1]的类型，<class 'bytes'>

a[0:1] = bytearray("d",encoding='utf-8')  #改变a[0:1]的值
print(a)

b[0:1]= b"d"  #bytes是不可变对象，报错
print(b)

执行以上程序会输出如下结果：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


bytearray(b'abc')
b'abc'
bytearray(b'a')
b'a'
<class 'bytearray'>
<class 'bytes'>
bytearray(b'dbc')
Traceback (most recent call last):
  File "/home/main.py", line 16, in <module>
    b[0:1]= b"d"
TypeError: 'bytes' object does not support item assignment

转载请注明本网址。

文章目录

str类型和bytes类型区别

bytes类型和bytearray类型区别