Python 读写配置文件

2023-10-18

字数统计: 3.2k字 | 阅读时长≈ 14分

配置文件是提供程序运行时读取配置信息的文件，用于将配置信息与程序分离，这样做的好处是显而易见的，例如: 在开源社区贡献自己源代码时，将一些敏感信息通过配置文件读取；提交源代码时不提交配置文件可以避免自己的用户名，密码等敏感信息泄露；我们可以通过配置文件保存程序运行时的中间结果；将环境信息（如操作系统类型）写入配置文件会增加程序的兼容性，使程序变得更加通用。

基本的配置文件

Python 内置的的配置文件解析器模块 configparser 提供 ConfigParser 类来解析基本的配置文件，我们可以使用它来编写 Python 程序，让用户最终通过配置文件轻松定制自己需要的 Python 应用程序。

常见的 pip 配置文件如下

[global]
index-url=https://mirrors.aliyun.com/pypi/simple/

[install]
trusted-host=mirrors.aliyun.com

现在我们编写一个程序来读取配置文件的信息（read_conf.py）

import configparser


config = configparser.ConfigParser()              # 实例化 ConfigParser 类

config.read('/Users/wanwu/.pip/pip.conf')         # 读取配置文件
print("遍历配置信息: ")

for section in config.sections():                 # 首先读取 sections
    print(f"section is [{section}]")
    for key in config[section]:                   # 遍历每个 section 的键和值
        print(f"key is [{key}], value is [{config[section][key]}]")

print("通过键获取相应的值")
print(f"index-url is [{config['global']['index-url']}]")
print(f"trusted-host is [{config['install']['trusted-host']}]")

在命令行执行以上代码。输出信息如下：

遍历配置信息: 
section is [global]
key is [index-url], value is [https://mirrors.aliyun.com/pypi/simple/]
section is [install]
key is [trusted-host], value is [mirrors.aliyun.com]
通过键获取相应的值
index-url is [https://mirrors.aliyun.com/pypi/simple/]
trusted-host is [mirrors.aliyun.com]

将相关信息写入配置文件（write_conf.py）

import configparser

config = configparser.ConfigParser()
config["DEFAULT"] = {
    "ServerAliveInterval": "45",
    "Compression": "yes",
    "CompressionLevel": "9",
}

config["bitbucket.org"] = {}
config["bitbucket.org"]["User"] = "hg"

config["topsecret.server.com"] = {}
topsecret = config["topsecret.server.com"]
topsecret["Port"] = "50022"
topsecret["ForwardX11"] = "no"

config["DEFAULT"]["ForwardX11"] = "yes"

with open("example.conf", "w") as configfile:       # 将上述配置信息 config 写入文件
    config.write(configfile)

with open("example.conf", "r") as f:                # 读取 example.ini 验证上述写入是否正确
    print(f.read())

上述代码通过实例化 ConfigParser 类增加相关配置信息，最后写入配置文件。执行以上代码，输出信息如下:

[DEFAULT]
serveraliveinterval = 45
compression = yes
compressionlevel = 9
forwardx11 = yes

[bitbucket.org]
user = hg

[topsecret.server.com]
port = 50022
forwardx11 = no

从上面读写配置文件的例子可以看出，configparser 模块的接口非常直接，明确。请注意以下几点:

section 名称是区分大小写的；
section 下的键值对中键是不区分大小写的，config["bitbucket.org"]["User"] 在写入时会统一变成小写的 user 保存在文件中。
section 下的键值对中的值是不区分类型的，都是字符串，具体使用时需要转换成需要的数据类型，如 int(config["topsecret.server.com"]["Port"])，其值为整数 50022。对于一些不方便转换的，解析器提供了一些常用的方法。如: getboolean(), getint(), getfloat()等，如 config["DEFAULT"].getboolean('Compression') 的类型为 bool，值为 True。用户可以自己注册自己的转换器或定制提供的转换方法。
section 的名称是 [DEFAULT] 时，其他 section 的键值会继承 [DEFAULT] 的键值信息。如本例中的 config["bitbucket.org"]["ServerAliveInterval"] 的值是 45。

解析 XML 文件

XML 的全称是 eXtensible Markup Language，意为可扩展的标记语言，是一种用于标记电子文件，使其具有结构性的标记语言。以 XML 结构存储数据的文件就是 XML 文件，它被设计用来传输和存储数据。

例如有以下内容的 xml 文件
1
2
3
4
5
6
<note>
<to>George</to>
<from>John</from>
<heading>Reminder</heading>
<body>Don't forget the meeting!</body>
</note>
其内容表示一份便签，来自 John，发送给 George，标题是 Reminder，正文是 Don't forget the meeting!。XML 本身并没有定义 note, to, from 等标签，是生成 XML 文件时自定义的，但是我们仍能理解其含义，XML 文档仍然没有做任何事情，它仅仅是包装在 XML 标签中的纯粹信息。我们编写程序来获取文档结构信息就是解析 XML 文件。

Python 有三种方法解析 XML: SAX，DOM，ElementTre。

SAX（simple API for XML）

SAX 是一种基于事件驱动的 API，使用时涉及两个部分，即解析器和事件处理器。解析器负责读取 XML 文件，并向事件处理器发送相应的事件（如元素开始事件，元素结束时间）。事件处理器对相应的事件做出响应，对数据做出处理。使用方法是先创建一个新的 XMLReader 对象，然后设置 XMLReader 的事件处理器 ContentHandler，最后执行 XMLReader 的 parse() 方法。

创建一个新的 XMLReader 对象。parser_list 是可选参数，是解析器列表 xml.sax.make_parser([parser_list])

自定义事件处理器，集成 ContentHandler 类，该类的方法可参见下表

名称	功能
characters(content)	从行开始，遇到标签之前，存在字符，content 的值为这些字符串。从一个标签，遇到下一个标签之前，存在字符，content 的值为这些字符串。从一个标签，遇到行结束符之前，存在字符，content 的值为这些字符串。标签可以是开始标签，也可以是结束标签
startDocument()	文档启用时调用
endDocument()	解析器到达文件结尾时使用
startElement(name, attrs)	遇到 XML 开始标签时调用，name 是标签名字，attrs 是标签的属性值字典
endElement(name)	遇到 XML 结束标签时调用

执行 XMLReader 的 parser() 方法

1	xml.sax.parser(xmlfile, contenthandler[, errorhandler])

参数说明:

xmlstring: xml 字符串
contenthandler: 必须是一个 ContentHandler 的对象
errorhandler: 如果指定该参数，则 errorhandler 必须是一个 SAXErrorHandler 对象。

下面来看一个解析 XML 的例子。example.xml 文件内容如下

<breakfast_menu menu_year="2018">
    <food>
        <name>Belgian Waffles</name>
        <price>$5.95</price>
        <description>
            two of our famous Belgian Waffles with plenty of real maple syrup
        </description>
        <calories>650</calories>
    </food>
    <food>
        <name>Strawberry Belgian Walffles</name>
        <price>$7.95</price>
        <description>
            light Belgian waffles covered with strawberries and whipped cream
        </description>
        <calories>900</calories>
    </food>
    <food>
        <name>Berry-Berry Belgian Waffles</name>
        <price>$8.95</price>
        <description>
            light Belgian waffles covered with an assortment of fresh berries and whipped cream
        </description>
        <calories>900</calories>
    </food>
    <food>
        <name>French Toast</name>
        <price>$4.50</price>
        <description>
            thick slices made from our homemade sourdough bread
        </description>
        <calories>600</calories>
    </food>
    <food>
        <name>Homestyle Breakfast</name>
        <price>$6.95</price>
        <description>
            two eggs, bacon or sausage, toast, and our ever-popular hash browns
        </description>
        <calories>950</calories>
    </food>
</breakfast_menu>

read_xml.py 文件的内容如下

#!/root/scripts/jenkins/venv/bin/python
# -*- coding:utf-8 -*-
# @Author: wanwu
# @Email: 2350686113@qq.com
# @Date: 2023/3/21
# @Last modified by: wanwu
# @Last modified time: 2023/3/21
# @Descriptions: 读取 XML 配置文件

import xml.sax


class MenuHandler(xml.sax.handler.ContentHandler):

    def __init__(self):
        self.CurrentData = ""
        self.name = ""
        self.price = ""
        self.description = ""
        self.calories = ""

    def startElement(self, tag, attributes):
        """ 元素调用开始 """
        self.CurrentData = tag
        if tag == "breakfast_menu":
            print("这是一个早餐的菜单")
            year = attributes["menu_year"]
            print(f"年份 {year}\n")

    def characters(self, content):
        """ 读取字符时调用 """
        if self.CurrentData == "name":
            self.name = content
        elif self.CurrentData == "price":
            self.price = content
        elif self.CurrentData == "description":
            self.description += content     # 如果内容有换行，就累加字符串，输出后清空该属性
        elif self.CurrentData == "calories":
            self.calories = content
        else:
            pass

    def endElement(self, tag):
        """ 元素调用结束"""
        if self.CurrentData == "name":
            print(f"name: {self.name}")
        elif self.CurrentData == "price":
            print(f"price: {self.price}")
        elif self.CurrentData == "description":
            print(f"description: {self.description}")
            self.description = ""   # 内容有换行时，获取字符串后清空该属性，为下一个标签准备
        elif self.CurrentData == "calories":
            print(f"calories: {self.calories}")
        else:
            pass

        self.CurrentData = ""


if __name__ == "__main__":
    # 创建一个 XMLReader
    parser = xml.sax.make_parser()

    # 重写 ContextHandler
    Handler = MenuHandler()
    parser.setContentHandler(Handler)

    parser.parse("example.xml")

代码说明: read_xml.py 自定义一个 MenuHandler 类，继承至 xml.sax.handler.ContentHandler，使用 ContentHandler 的方法来处理相应的标签。在主程序入口先获取一个 XMLReader 对象，并设置其事件处理器为自定义的 MenuHandler，最后调用 parse 方法来解析 example.xml。运行结果如下所示:

这是一个早餐的菜单
年份 2018

name: Belgian Waffles
price: $5.95
description: 
            two of our famous Belgian Waffles with plenty of real maple syrup
        
calories: 650
name: Strawberry Belgian Walffles
price: $7.95
description: 
            light Belgian waffles covered with strawberries and whipped cream
        
calories: 900
name: Berry-Berry Belgian Waffles
price: $8.95
description: 
            light Belgian waffles covered with an assortment of fresh berries and whipped cream
        
calories: 900
name: French Toast
price: $4.50
description: 
            thick slices made from our homemade sourdough bread
        
calories: 600
name: Homestyle Breakfast
price: $6.95
description: 
            two eggs, bacon or sausage, toast, and our ever-popular hash browns
        
calories: 950

SAX 用事件驱动模型，通过在解析 XML 的过程中触发一个个的事件并调用用户定义的回调函数来处理 XML 文件，一次处理一个标签，无须事先全部读取整个 XML 文档，处理效率较高。其适用场景如下:

对大型文件进行处理；
只需要文件的部分内容，或者只需要从文件中得到特定的信息；
想建立自己的对象模型时。

DOM（Document Object Model）

文件对象模型（Document Object Model，DOM）是 W3C 组织推荐的处理可扩展置标语言的标准编程接口。一个 DOM 的解析器在解析一个 XML 文档时，一次性读取整个文档，把文档中的所有元素保存在内存中的一个树结构里，之后可以利用 DOM 提供的不同函数来读取或修改文档的内容和结构，也可以把修改过的内容写入 xml 文件。

以下示例使用 xml.dom.minidom 解析 xml 文件。

# -*- coding:utf-8 -*-
import xml.dom.minidom


# 使用 minidom 解析器打开 XML 文档
DOMTree = xml.dom.minidom.parse("example.xml")
collection = DOMTree.documentElement
if collection.hasAttribute("menu_year"):
    print(f"这是一个早餐的菜单\n年份 {collection.getAttribute('menu_year')}")

# 在集合中获取所有早餐菜单信息
foods = collection.getElementsByTagName("food")
# 打印每个菜单的详细信息
for food in foods:
    name = food.getElementsByTagName("name")[0]
    print("name: %s" % name.childNodes[0].data)
    price = food.getElementsByTagName("price")[0]
    print("price: %s" % price.childNodes[0].data)
    description = food.getElementsByTagName("description")[0]
    print("description: %s" % description.childNodes[0].data.strip())
    calories = food.getElementsByTagName("calories")[0]
    print("calories: %s" % calories.childNodes[0].data)

代码说明: 代码使用 minidom 解析器打开 XML 文档，使用 getElementsByTagName 方法获取所有标签并遍历子标签，逻辑上比 SAX 要直观。

运行结果如下

这是一个早餐的菜单
年份 2018
name: Belgian Waffles
price: $5.95
description: two of our famous Belgian Waffles with plenty of real maple syrup
calories: 650
name: Strawberry Belgian Walffles
price: $7.95
description: light Belgian waffles covered with strawberries and whipped cream
calories: 900
name: Berry-Berry Belgian Waffles
price: $8.95
description: light Belgian waffles covered with an assortment of fresh berries and whipped cream
calories: 900
name: French Toast
price: $4.50
description: thick slices made from our homemade sourdough bread
calories: 600
name: Homestyle Breakfast
price: $6.95
description: two eggs, bacon or sausage, toast, and our ever-popular hash browns
calories: 950

ElementTre

ElementTre 将 XML 数据在内存中解析成树，通过树来操作 XML。

以下示例使用 ElementTre 来解析 XML

# -*- coding:utf-8 -*-
import xml.etree.ElementTree as ET


tree = ET.parse("example.xml")
root = tree.getroot()
print(f"这是一份早餐菜单\n{root.attrib['menu_year']}")

for child in root:
    print("name: ", child[0].text)
    print("price: ", child[1].text)
    print("description: ", child[2].text.strip())
    print("calories: ", child[3].text)

代码相当简洁，运行结果如下

这是一份早餐菜单
2018
name:  Belgian Waffles
price:  $5.95
description:  two of our famous Belgian Waffles with plenty of real maple syrup
calories:  650
name:  Strawberry Belgian Walffles
price:  $7.95
description:  light Belgian waffles covered with strawberries and whipped cream
calories:  900
name:  Berry-Berry Belgian Waffles
price:  $8.95
description:  light Belgian waffles covered with an assortment of fresh berries and whipped cream
calories:  900
name:  French Toast
price:  $4.50
description:  thick slices made from our homemade sourdough bread
calories:  600
name:  Homestyle Breakfast
price:  $6.95
description:  two eggs, bacon or sausage, toast, and our ever-popular hash browns
calories:  950