NotePublic/Software/Development/Language/Python/Python_string_与_bytes.md

# Python string 与 bytes

总的来说，bytes 和 string 的关系是：

bytes ---decode--> string
bytes <--encode--- string

常见的几种编码及格式：

* utf8：形如\xe4\xbb\x8a\xe5\xa4
* unicode：形如\u4eca\u5929\u5929\u6c14\u4e0d\u9519

如果 "\" 变成了 "\\" 说明原字符串是编码后的格式，变成 "\\" 是因为转换成了bytes。

## 1.string 转 bytes

```python
s = "abc"           # string
s = "abc".encode()  # bytes，encode 默认编码方式是 utf-8
s = b"abc"          # bytes
# 或
s = "abc"           # string
s = bytes(s, encoding = "utf8")     # bytes
```

## 2.bytes 转 string

```python
s = b"abc"          # bytes
s = b"abc".decode() # string，encode 默认编码方式是 utf-8
s = str(b"")        # string
# 或
s = b"abc"          # bytes
s = str(s, encoding = "utf8")   # string
```

## 3.bytes 类型的 unicode（中文）输出

```python
s = '\\u4eca\\u5929\\u5929\\u6c14\\u4e0d\\u9519'    # 中文是：今天天气真不错
new_s = s.encode().decode('unicode_escape')         # 输出为：今天天气真不错
```

## 4.string 开头 r/b/u/f 的含义

```python
b'input\n' # bytes字节符，打印以b开头。输出：b'input\n'
r'input\n' # 非转义原生字符，经处理'\n'变成了'\\'和'n'。也就是\n表示的是两个字符，而不是换行。输出：'input\\n'
u'input\n' # unicode编码字符，python3默认字符串编码方式。输出：'input\n'
```

f 开头：

```python
import time
t0 = time.time()
time.sleep(1)
name = 'processing'
print(f'{name} done in {time.time() - t0:.2f} s')  # 以f开头表示在字符串内支持大括号内的python 表达式
输出：
processing done in 1.00 s
```
-												修改文件名并补充 r/b/u/f 的含义.

Signed-off-by: rick.chan <chenyang@autoai.com>

											
										
										
											2020-11-23 18:22:35 +08:00
+								# Python string 与 bytes
-												增加 Python str 与 bytes 之间的转换.

Signed-off-by: rick.chan <chenyang@autoai.com>

											
										
										
											2020-10-13 13:34:32 +08:00
-												补充内容.

Signed-off-by: rick.chan <chenyang@autoai.com>

											
										
										
											2020-10-13 14:26:49 +08:00
+								总的来说，bytes 和 string 的关系是：
 								bytes ---decode--> string
 								bytes <--encode--- string
 								常见的几种编码及格式：
-												增加 Python str 与 bytes 之间的转换.

Signed-off-by: rick.chan <chenyang@autoai.com>

											
										
										
											2020-10-13 13:34:32 +08:00
-												补充内容.

Signed-off-by: rick.chan <chenyang@autoai.com>

											
										
										
											2020-10-13 14:26:49 +08:00
+								* utf8：形如\xe4\xbb\x8a\xe5\xa4
 								* unicode：形如\u4eca\u5929\u5929\u6c14\u4e0d\u9519
-												增加 Python str 与 bytes 之间的转换.

Signed-off-by: rick.chan <chenyang@autoai.com>

											
										
										
											2020-10-13 13:34:32 +08:00
-												补充内容.

Signed-off-by: rick.chan <chenyang@autoai.com>

											
										
										
											2020-10-13 14:26:49 +08:00
+								如果 "\" 变成了 "\\" 说明原字符串是编码后的格式，变成 "\\" 是因为转换成了bytes。
-												增加 Python str 与 bytes 之间的转换.

Signed-off-by: rick.chan <chenyang@autoai.com>

											
										
										
											2020-10-13 13:34:32 +08:00
-												补充内容.

Signed-off-by: rick.chan <chenyang@autoai.com>

											
										
										
											2020-10-13 14:26:49 +08:00
+								## 1.string 转 bytes
 								```python
 								s = "abc"           # string
 								s = "abc".encode()  # bytes，encode 默认编码方式是 utf-8
 								s = b"abc"          # bytes
 								# 或
 								s = "abc"           # string
 								s = bytes(s, encoding = "utf8")     # bytes
 								```
-												增加 Python str 与 bytes 之间的转换.

Signed-off-by: rick.chan <chenyang@autoai.com>

											
										
										
											2020-10-13 13:34:32 +08:00
-												补充内容.

Signed-off-by: rick.chan <chenyang@autoai.com>

											
										
										
											2020-10-13 14:26:49 +08:00
+								## 2.bytes 转 string
-												增加 Python str 与 bytes 之间的转换.

Signed-off-by: rick.chan <chenyang@autoai.com>

											
										
										
											2020-10-13 13:34:32 +08:00
-												补充内容.

Signed-off-by: rick.chan <chenyang@autoai.com>

											
										
										
											2020-10-13 14:26:49 +08:00
+								```python
 								s = b"abc"          # bytes
 								s = b"abc".decode() # string，encode 默认编码方式是 utf-8
 								s = str(b"")        # string
 								# 或
 								s = b"abc"          # bytes
 								s = str(s, encoding = "utf8")   # string
 								```
 								## 3.bytes 类型的 unicode（中文）输出
 								```python
 								s = '\\u4eca\\u5929\\u5929\\u6c14\\u4e0d\\u9519'    # 中文是：今天天气真不错
 								new_s = s.encode().decode('unicode_escape')         # 输出为：今天天气真不错
-												增加 Python str 与 bytes 之间的转换.

Signed-off-by: rick.chan <chenyang@autoai.com>

											
										
										
											2020-10-13 13:34:32 +08:00
+								```
-												修改文件名并补充 r/b/u/f 的含义.

Signed-off-by: rick.chan <chenyang@autoai.com>

											
										
										
											2020-11-23 18:22:35 +08:00
 								## 4.string 开头 r/b/u/f 的含义
 								```python
 								b'input\n' # bytes字节符，打印以b开头。输出：b'input\n'
 								r'input\n' # 非转义原生字符，经处理'\n'变成了'\\'和'n'。也就是\n表示的是两个字符，而不是换行。输出：'input\\n'
 								u'input\n' # unicode编码字符，python3默认字符串编码方式。输出：'input\n'
 								```
 								f 开头：
 								```python
 								import time
 								t0 = time.time()
 								time.sleep(1)
 								name = 'processing'
 								print(f'{name} done in {time.time() - t0:.2f} s')  # 以f开头表示在字符串内支持大括号内的python 表达式
 								输出：
 								processing done in 1.00 s
 								```