Chapter 10: Dictionaries | Python for Everybody 讲义笔记_En

文章目录

  • Python for Everybody
    • 课程简介
    • Dictionaries
      • Dictionaries
      • Dictionary as a set of counters
      • Dictionaries and files
      • Looping and dictionaries
      • Advanced text parsing
      • Debugging
      • Glossary


Python for Everybody

Exploring Data Using Python 3
Dr. Charles R. Severance


课程简介

Python for Everybody 零基础程序设计(Python 入门)

  • This course aims to teach everyone the basics of programming computers using Python. 本课程旨在向所有人传授使用 Python 进行计算机编程的基础知识。
  • We cover the basics of how one constructs a program from a series of simple instructions in Python. 我们介绍了如何通过 Python 中的一系列简单指令构建程序的基础知识。
  • The course has no pre-requisites and avoids all but the simplest mathematics. Anyone with moderate computer experience should be able to master the materials in this course. 该课程没有任何先决条件,除了最简单的数学之外,避免了所有内容。任何具有中等计算机经验的人都应该能够掌握本课程中的材料。
  • This course will cover Chapters 1-5 of the textbook “Python for Everybody”. Once a student completes this course, they will be ready to take more advanced programming courses. 本课程将涵盖《Python for Everyday》教科书的第 1-5 章。学生完成本课程后,他们将准备好学习更高级的编程课程。
  • This course covers Python 3.

在这里插入图片描述

coursera

Python for Everybody 零基础程序设计(Python 入门)

Charles Russell Severance
Clinical Professor

在这里插入图片描述

个人主页
Twitter

在这里插入图片描述

University of Michigan


课程资源

coursera原版课程视频
coursera原版视频-中英文精校字幕-B站
Dr. Chuck官方翻录版视频-机器翻译字幕-B站

PY4E-课程配套练习
Dr. Chuck Online - 系列课程开源官网



Dictionaries

The dictionary data structures allows us to store multiple values in an object and look up the values by their key.


Dictionaries

A dictionary is like a list, but more general. In a list, the index positions have to be integers; in a dictionary, the indices can be (almost) any type.

You can think of a dictionary as a mapping between a set of indices (which are called keys) and a set of values. Each key maps to a value. The association of a key and a value is called a key-value pair or sometimes an item.

As an example, we’ll build a dictionary that maps from English to Spanish words, so the keys and the values are all strings.

The function dict creates a new dictionary with no items. Because dict is the name of a built-in function, you should avoid using it as a variable name.

>>> eng2sp = dict()
>>> print(eng2sp)
{}

The curly brackets, {}, represent an empty dictionary. To add items to the dictionary, you can use square brackets:

>>> eng2sp['one'] = 'uno'

This line creates an item that maps from the key 'one' to the value “uno”. If we print the dictionary again, we see a key-value pair with a colon between the key and value:

>>> print(eng2sp)
{'one': 'uno'}

This output format is also an input format. For example, you can create a new dictionary with three items.

>>> eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'}
>>> print(eng2sp)
{'one': 'uno', 'two': 'dos', 'three': 'tres'}

Since Python 3.7x the order of key-value pairs is the same as their input order, i.e. dictionaries are now ordered structures.

But that doesn’t really matter because the elements of a dictionary are never indexed with integer indices. Instead, you use the keys to look up the corresponding values:

>>> print(eng2sp['two'])
'dos'

The key 'two' always maps to the value “dos” so the order of the items doesn’t matter.

If the key isn’t in the dictionary, you get an exception:

>>> print(eng2sp['four'])
KeyError: 'four'

The len function works on dictionaries; it returns the number of key-value pairs:

>>> len(eng2sp)
3

The in operator works on dictionaries; it tells you whether something appears as a key in the dictionary (appearing as a value is not good enough).

>>> 'one' in eng2sp
True
>>> 'uno' in eng2sp
False

To see whether something appears as a value in a dictionary, you can use the method values, which returns the values as a type that can be converted to a list, and then use the in operator:

>>> vals = list(eng2sp.values())
>>> 'uno' in vals
True

The in operator uses different algorithms for lists and dictionaries. For lists, it uses a linear search algorithm. As the list gets longer, the search time gets longer in direct proportion to the length of the list. For dictionaries, Python uses an algorithm called a hash table that has a remarkable property: the in operator takes about the same amount of time no matter how many items there are in a dictionary. I won’t explain why hash functions are so magical, but you can read more about it at wikipedia.org/wiki/Hash_table.

Exercise 1:
Download a copy of the file www.py4e.com/code3/words.txt
Write a program that reads the words in words.txt and stores them as keys in a dictionary. It doesn’t matter what the values are. Then you can use the in operator as a fast way to check whether a string is in the dictionary.

Dictionary as a set of counters

Suppose you are given a string and you want to count how many times each letter appears. There are several ways you could do it:

  1. You could create 26 variables, one for each letter of the alphabet. Then you could traverse the string and, for each character, increment the corresponding counter, probably using a chained conditional.

  2. You could create a list with 26 elements. Then you could convert each character to a number (using the built-in function ord), use the number as an index into the list, and increment the appropriate counter.

  3. You could create a dictionary with characters as keys and counters as the corresponding values. The first time you see a character, you would add an item to the dictionary. After that you would increment the value of an existing item.

Each of these options performs the same computation, but each of them implements that computation in a different way.

An implementation is a way of performing a computation; some implementations are better than others. For example, an advantage of the dictionary implementation is that we don’t have to know ahead of time which letters appear in the string and we only have to make room for the letters that do appear.

Here is what the code might look like:

word = 'brontosaurus'
d = dict()
for c in word:
    if c not in d:
        d[c] = 1
    else:
        d[c] = d[c] + 1
print(d)

We are effectively computing a histogram, which is a statistical term for a set of counters (or frequencies).

The for loop traverses the string. Each time through the loop, if the character c is not in the dictionary, we create a new item with key c and the initial value 1 (since we have seen this letter once). If c is already in the dictionary we increment d[c].

Here’s the output of the program:

{'a': 1, 'b': 1, 'o': 2, 'n': 1, 's': 2, 'r': 2, 'u': 2, 't': 1}

The histogram indicates that the letters “a” and “b” appear once; “o” appears twice, and so on.

Dictionaries have a method called get that takes a key and a default value. If the key appears in the dictionary, get returns the corresponding value; otherwise it returns the default value. For example:

>>> counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
>>> print(counts.get('jan', 0))
100
>>> print(counts.get('tim', 0))
0

We can use get to write our histogram loop more concisely. Because the get method automatically handles the case where a key is not in a dictionary, we can reduce four lines down to one and eliminate the if statement.

word = 'brontosaurus'
d = dict()
for c in word:
    d[c] = d.get(c,0) + 1
print(d)

The use of the get method to simplify this counting loop ends up being a very commonly used “idiom” in Python and we will use it many times in the rest of the book. So you should take a moment and compare the loop using the if statement and in operator with the loop using the get method. They do exactly the same thing, but one is more succinct.

Dictionaries and files

One of the common uses of a dictionary is to count the occurrence of words in a file with some written text. Let’s start with a very simple file of words taken from the text of Romeo and Juliet.

For the first set of examples, we will use a shortened and simplified version of the text with no punctuation. Later we will work with the text of the scene with punctuation included.

But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief

We will write a Python program to read through the lines of the file, break each line into a list of words, and then loop through each of the words in the line and count each word using a dictionary.

You will see that we have two for loops. The outer loop is reading the lines of the file and the inner loop is iterating through each of the words on that particular line. This is an example of a pattern called nested loops because one of the loops is the outer loop and the other loop is the inner loop.

Because the inner loop executes all of its iterations each time the outer loop makes a single iteration, we think of the inner loop as iterating “more quickly” and the outer loop as iterating more slowly.

The combination of the two nested loops ensures that we will count every word on every line of the input file.

fname = input('Enter the file name: ')
try:
    fhand = open(fname)
except:
    print('File cannot be opened:', fname)
    exit()

counts = dict()
for line in fhand:
    words = line.split()
    for word in words:
        if word not in counts:
            counts[word] = 1
        else:
            counts[word] += 1

print(counts)

# Code: http://www.py4e.com/code3/count1.py

In our else statement, we use the more compact alternative for incrementing a variable. counts[word] += 1 is equivalent to counts[word] = counts[word] + 1. Either method can be used to change the value of a variable by any desired amount. Similar alternatives exist for -=, *=, and /=.

When we run the program, we see a raw dump of all of the counts in unsorted hash order. (the romeo.txt file is available at www.py4e.com/code3/romeo.txt)

python count1.py
Enter the file name: romeo.txt
{'But': 1, 'soft': 1, 'what': 1, 'light': 1, 'through': 1, 'yonder': 1,
'window': 1, 'breaks': 1, 'It': 1, 'is': 3, 'the': 3, 'east': 1, 'and': 3,
'Juliet': 1, 'sun': 2, 'Arise': 1, 'fair': 1, 'kill': 1, 'envious': 1,
'moon': 1, 'Who': 1, 'already': 1, 'sick': 1, 'pale': 1, 'with': 1,
'grief': 1}

It is a bit inconvenient to look through the dictionary to find the most common words and their counts, so we need to add some more Python code to get us the output that will be more helpful.

Looping and dictionaries

If you use a dictionary as the sequence in a for statement, it traverses the keys of the dictionary. This loop prints each key and the corresponding value:

counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
for key in counts:
    print(key, counts[key])

Here’s what the output looks like:

chuck 1
annie 42
jan 100

Again, the keys are ordered.

We can use this pattern to implement the various loop idioms that we have described earlier. For example if we wanted to find all the entries in a dictionary with a value above ten, we could write the following code:

counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
for key in counts:
    if counts[key] > 10 :
        print(key, counts[key])

The for loop iterates through the keys of the dictionary, so we must use the index operator to retrieve the corresponding value for each key. Here’s what the output looks like:

annie 42
jan 100

We see only the entries with a value above 10.

If you want to print the keys in alphabetical order, you first make a list of the keys in the dictionary using the keys method available in dictionary objects, and then sort that list and loop through the sorted list, looking up each key and printing out key-value pairs in sorted order as follows:

counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
lst = list(counts.keys())
print(lst)
lst.sort()
for key in lst:
    print(key, counts[key])

Here’s what the output looks like:

['chuck', 'annie', 'jan']
annie 42
chuck 1
jan 100

First you see the list of keys in non-alphabetical order that we get from the keys method. Then we see the key-value pairs in alphabetical order from the for loop.

Advanced text parsing

In the above example using the file romeo.txt, we made the file as simple as possible by removing all punctuation by hand. The actual text has lots of punctuation, as shown below.

But, soft! what light through yonder window breaks?
It is the east, and Juliet is the sun.
Arise, fair sun, and kill the envious moon,
Who is already sick and pale with grief,

Since the Python split function looks for spaces and treats words as tokens separated by spaces, we would treat the words “soft!” and “soft” as different words and create a separate dictionary entry for each word.

Also since the file has capitalization, we would treat “who” and “Who” as different words with different counts.

We can solve both these problems by using the string methods lower, punctuation, and translate. The translate is the most subtle of the methods. Here is the documentation for translate:

line.translate(str.maketrans(fromstr, tostr, deletestr))

Replace the characters in fromstr with the character in the same position in tostr and delete all characters that are in deletestr. The fromstr and tostr can be empty strings and the deletestr parameter can be omitted.

We will not specify the tostr but we will use the deletestr parameter to delete all of the punctuation. We will even let Python tell us the list of characters that it considers “punctuation”:

>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

The parameters used by translate were different in Python 2.0.

We make the following modifications to our program:

import string

fname = input('Enter the file name: ')
try:
    fhand = open(fname)
except:
    print('File cannot be opened:', fname)
    exit()

counts = dict()
for line in fhand:
    line = line.rstrip()
    line = line.translate(line.maketrans('', '', string.punctuation))
    line = line.lower()
    words = line.split()
    for word in words:
        if word not in counts:
            counts[word] = 1
        else:
            counts[word] += 1

print(counts)

# Code: http://www.py4e.com/code3/count2.py

Part of learning the “Art of Python” or “Thinking Pythonically” is realizing that Python often has built-in capabilities for many common data analysis problems. Over time, you will see enough example code and read enough of the documentation to know where to look to see if someone has already written something that makes your job much easier.

The following is an abbreviated version of the output:

Enter the file name: romeo-full.txt
{'romeo': 40, 'and': 42, 'juliet': 32, 'act': 1, '2': 2, 'scene': 2,
'ii': 1, 'capulets': 1, 'orchard': 2, 'enter': 1, 'he': 5, 'jests': 1,
'at': 9, 'scars': 1, 'that': 30, 'never': 2, 'felt': 1, 'a': 24, 'wound': 1,
'appears': 1, 'above': 6, 'window': 2, 'but': 18, 'soft': 1, 'what': 11,
'light': 5, 'through': 2, 'yonder': 2, 'breaks': 1, ...}

Looking through this output is still unwieldy and we can use Python to give us exactly what we are looking for, but to do so, we need to learn about Python tuples. We will pick up this example once we learn about tuples.

Debugging

As you work with bigger datasets it can become unwieldy to debug by printing and checking data by hand. Here are some suggestions for debugging large datasets:

Scale down the input
If possible, reduce the size of the dataset. For example if the program reads a text file, start with just the first 10 lines, or with the smallest example you can find. You can either edit the files themselves, or (better) modify the program so it reads only the first n lines.

If there is an error, you can reduce n to the smallest value that manifests the error, and then increase it gradually as you find and correct errors.

Check summaries and types
Instead of printing and checking the entire dataset, consider printing summaries of the data: for example, the number of items in a dictionary or the total of a list of numbers.

A common cause of runtime errors is a value that is not the right type. For debugging this kind of error, it is often enough to print the type of a value.

Write self-checks
Sometimes you can write code to check for errors automatically. For example, if you are computing the average of a list of numbers, you could check that the result is not greater than the largest element in the list or less than the smallest. This is called a “sanity check” because it detects results that are “completely illogical”.

Another kind of check compares the results of two different computations to see if they are consistent. This is called a “consistency check”.

Pretty print the output
Formatting debugging output can make it easier to spot an error.
Again, time you spend building scaffolding can reduce the time you spend debugging.

Glossary

dictionary
A mapping from a set of keys to their corresponding values.
hashtable
The algorithm used to implement Python dictionaries.
hash function
A function used by a hashtable to compute the location for a key.
histogram
A set of counters.
implementation
A way of performing a computation.
item
Another name for a key-value pair.
key
An object that appears in a dictionary as the first part of a key-value pair.
key-value pair
The representation of the mapping from a key to a value.
lookup
A dictionary operation that takes a key and finds the corresponding value.
nested loops
When there are one or more loops “inside” of another loop. The inner loop runs to completion each time the outer loop runs once.
value
An object that appears in a dictionary as the second part of a key-value pair. This is more specific than our previous use of the word “value”.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/58972.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

Spring:JDBCTemplate

JDBCTemplate 概述 概述 JDBC&#xff08;Java DataBase Connectivity&#xff0c;Java 数据库连接&#xff09;&#xff0c; 一 种用于执行 SQL 语句的 Java API&#xff08;Application Programming Interface &#xff0c; 应用程序设计接口 &#xff09;&#xff0c;可以为…

计算机网络(2) --- 网络套接字UDP

计算机网络&#xff08;1&#xff09; --- 网络介绍_哈里沃克的博客-CSDN博客https://blog.csdn.net/m0_63488627/article/details/131967378?spm1001.2014.3001.5501 目录 1.端口号 2.TCP与UDP协议 1.TCP协议介绍 1.TCP协议 2.UDP协议 3.理解 2.网络字节序 发送逻辑…

小白到运维工程师自学之路 第六十六集 (docker 网络模型)

一、概述 Docker网络模型是指Docker容器在网络中的通信方式和组织结构。Docker容器通过网络连接&#xff0c;使得容器之间可以相互通信&#xff0c;并与主机和外部网络进行交互。 在Docker中&#xff0c;有几种不同的网络模型可供选择&#xff1a; 1、主机模式&#xff08;H…

网络安全(黑客)自学建议一一附学习路线

温馨提示&#xff1a;为了避免误入歧途&#xff0c;自学请优先看《网络安全法》。 下面是一些学习建议&#xff1a; 1、多请教有经验的人 切忌钻牛角尖&#xff0c;特别是刚入门的什么都不了解的情况下&#xff0c;可能你花好几天研究的一个东西&#xff0c;人10分钟就能搞定…

DBeaver连MySQL库报错public key retrieval is not allowed

连接报错: public key retrieval is not allowed解决办法&#xff1a; 右击你连接的库进行编辑连接&#xff08;或者直接按F4打开编辑&#xff09; 然后点击驱动属性里面进行设置 找到allowPublicKeyRetrieval属性&#xff0c;把值由false改为true 注&#xff1a;连接成功后如…

部署SpringBoot项目在服务器上,并通过swagger登录

1.项目编译打包 2.上传jar包到服务器并启动 xftp将打包好后的jar包传到虚拟机指定路径 java -jar执行该jar包 3.通过swagger登录 输入后点击下面的执行按钮 会得到一个tocken 4.将tocken放到postman的Headers中 5.修改url 例如我本地测试是http://localhost:8080/接口名&am…

【C++】继承的基本特性(定义,赋值转换,友元,静态成员,虚拟继承,默认成员函数,作用域)

文章目录 一、继承的定义1.定义格式2.继承基类成员访问方式的变化 二、基类和派生类对象赋值转换三、继承的作用域1. 在继承体系中基类和派生类都有独立的作用域。2.子类和父类中有同名成员3.成员函数的隐藏4.注意在实际中在继承体系里面最好不要定义同名的成员。 四、派生类的…

【WebRTC---序篇】(七)RTC多人连麦方案

服务端可以选择mediasoup&#xff0c;作为SFU服务器&#xff0c;只负责转发数据 下图举例三个Client (browser或者客户端)同时加入一个房间&#xff0c;每个app同时发布一路视频和一路音频&#xff0c;并且接受来自其他app的音视频流&#xff0c;mediasoup内部的结构如下&…

【诺依管理系统-前端】对话框

1.指定 v-model"openAllScene" 2.按钮代码 3.调用方法handleAddAllScene里面&#xff0c;只是openAllScene的值为true&#xff0c;调用显示对话框

unity tolua热更新框架教程(1)

git GitHub - topameng/tolua: The fastest unity lua binding solution 拉取到本地 使用unity打开&#xff0c;此处使用环境 打开前几个弹窗(管线和api升级)都点确定 修改项目设置 切换到安卓平台尝试打包编译 设置包名 查看报错 打开 屏蔽接口导出 重新生成 编译通过 …

Git常见问题

git clone 提示OpenSSL SSL_read git clone 时提示Connection was reset, errno 10054类错误 fatal: unable to acce ss https://github.com/fex-team/ueditor.git/: OpenSSL SSL_read: Connection was reset, errno 10054 备注&#xff1a;以下方法只是归纳整理&#xff0c;…

JVM GC ROOT分析

GC root原理:通过对枚举GCroot对象做引用可达性分析,即从GC root对象开始,向下搜索,形成的路径称之为 引用链。如果一个对象到GC roots对象没有任何引用,没有形成引用链,那么该对象等待GC回收,换而言之,如果减少内存泄漏,也就是切断引用链,常见的GCRoot对象如下: 1、…

【深度学习】在 MNIST实现自动编码器实践教程

一、说明 自动编码器是一种无监督学习的神经网络模型&#xff0c;主要用于降维或特征提取。常见的自动编码器包括基本的单层自动编码器、深度自动编码器、卷积自动编码器和变分自动编码器等。 其中&#xff0c;基本的单层自动编码器由一个编码器和一个解码器组成&#xff0c;编…

简单易懂的生鲜蔬果小程序开发指南

随着人们对健康意识的提高&#xff0c;越来越多的人开始注重饮食健康&#xff0c;选择新鲜的果蔬产品。为了满足市场需求&#xff0c;制作一个果蔬配送小程序成为了一个不错的选择。本文将详细介绍如何快速制作一个果蔬配送小程序。 第一步&#xff1a;登录乔拓云网后台&#x…

<van-empty description=““ /> 滚动条bug

使用 <van-empty description"" /> 时&#xff0c;图片出现了个滚动条&#xff0c;图片可以上下滑动。 代码如下&#xff1a; <block wx:if"{{courseList.length < 0}}"><van-empty description"" /> </block> <…

VLAN原理+配置

目录 一&#xff0c; 以太网二层交换机 二&#xff0c;三层架构&#xff1a; 三&#xff0c;VLAN配置思路 1.创建vlan 2.接口划入vlan 3.trunk干道 4.vlan间路由器 5.DHCP池塘配置 四&#xff0c;华为VLAN部分的接口模式讲解&#xff1a; 五&#xff0c;华为VLAN部分的…

mysql报错:name ‘_mysql‘ is not defined

原因是&#xff1a; Mysqldb 不兼容 python3.5 以后的版本 解决办法&#xff1a; 使用pymysql代替MySQLdb 在项目应用下的__init__.py 添加上去 import pymysqlpymysql.version_info (1, 4, 13, "final", 0) pymysql.install_as_MySQLdb()

【ChatGLM_02】LangChain知识库+Lora微调chatglm2-6b模型+提示词Prompt的使用原则

经验沉淀 1 知识库1.1 Langchain知识库的主要功能(1) 配置知识库(2) 文档数据测试(3) 知识库测试模式(4) 模型配置 2 微调2.1 微调模型的概念2.2 微调模型的方法和步骤(1) 基于ptuning v2 的微调(2) 基于lora的微调 3 提示词3.1 Prompts的定义及原则(1) Prompts是什么&#xf…

计算机网络(4) --- 协议定制

计算机网络&#xff08;3&#xff09; --- 网络套接字TCP_哈里沃克的博客-CSDN博客https://blog.csdn.net/m0_63488627/article/details/132035757?spm1001.2014.3001.5501 目录 1. 协议的基础知识 TCP协议通讯流程 ​编辑 2.协议 1.介绍 2.手写协议 1.内容 2.接口 …

Vue3 watch监听器

概览&#xff1a;watch监听器的定义以及使用场景。在vue3中的监听器的使用方式&#xff0c;watch的三个参数&#xff0c;以及进一步了解第一个参数可以是一个属性&#xff0c;也可以是一个数组的形式包含多个属性。 watch在vue3和vue2中的使用&#xff1a; vue3中&#xff1a…