Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit a30de41

Browse filesBrowse files
committed
add WER calculation tutorial
1 parent 8f21ae7 commit a30de41
Copy full SHA for a30de41

File tree

Expand file treeCollapse file tree

7 files changed

+92
-0
lines changed
Open diff view settings
Filter options
Expand file treeCollapse file tree

7 files changed

+92
-0
lines changed
Open diff view settings
Collapse file

‎README.md‎

Copy file name to clipboardExpand all lines: README.md
+1Lines changed: 1 addition & 0 deletions
  • Display the source diff
  • Display the rich diff
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ This is a repository of all the tutorials of [The Python Code](https://www.thepy
6060
- [Tokenization, Stemming, and Lemmatization in Python](https://www.thepythoncode.com/article/tokenization-stemming-and-lemmatization-in-python). ([code](machine-learning/nlp/tokenization-stemming-lemmatization))
6161
- [How to Fine Tune BERT for Semantic Textual Similarity using Transformers in Python](https://www.thepythoncode.com/article/finetune-bert-for-semantic-textual-similarity-in-python). ([code](machine-learning/nlp/semantic-textual-similarity))
6262
- [How to Calculate the BLEU Score in Python](https://www.thepythoncode.com/article/bleu-score-in-python). ([code](machine-learning/nlp/bleu-score))
63+
- [Word Error Rate in Python](https://www.thepythoncode.com/article/calculate-word-error-rate-in-python). ([code](machine-learning/nlp/wer-score))
6364
- ### [Computer Vision](https://www.thepythoncode.com/topic/computer-vision)
6465
- [How to Detect Human Faces in Python using OpenCV](https://www.thepythoncode.com/article/detect-faces-opencv-python). ([code](machine-learning/face_detection))
6566
- [How to Make an Image Classifier in Python using TensorFlow and Keras](https://www.thepythoncode.com/article/image-classification-keras-python). ([code](machine-learning/image-classifier))
Collapse file
+6Lines changed: 6 additions & 0 deletions
  • Display the source diff
  • Display the rich diff
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# [Word Error Rate in Python](https://www.thepythoncode.com/article/calculate-word-error-rate-in-python)
2+
- `pip install -r requirements.txt`
3+
- `wer_basic.py` is the basic implementation of WER algorithm.
4+
- `wer_accurate.py` is the accurate implementation of WER algorithm.
5+
- `wer_jiwer.py` is the implementation of WER algorithm using [jiwer](https://pypi.org/project/jiwer/).
6+
- `wer_evaluate.py` is the implementation of WER algorithm using [evaluate](https://pypi.org/project/evaluate/).
Collapse file
+3Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
numpy
2+
jiwer
3+
evaluate
Collapse file
+44Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
import numpy as np
2+
3+
def calculate_wer(reference, hypothesis):
4+
# Split the reference and hypothesis sentences into words
5+
ref_words = reference.split()
6+
hyp_words = hypothesis.split()
7+
# Initialize a matrix with size |ref_words|+1 x |hyp_words|+1
8+
# The extra row and column are for the case when one of the strings is empty
9+
d = np.zeros((len(ref_words) + 1, len(hyp_words) + 1))
10+
# The number of operations for an empty hypothesis to become the reference
11+
# is just the number of words in the reference (i.e., deleting all words)
12+
for i in range(len(ref_words) + 1):
13+
d[i, 0] = i
14+
# The number of operations for an empty reference to become the hypothesis
15+
# is just the number of words in the hypothesis (i.e., inserting all words)
16+
for j in range(len(hyp_words) + 1):
17+
d[0, j] = j
18+
# Iterate over the words in the reference and hypothesis
19+
for i in range(1, len(ref_words) + 1):
20+
for j in range(1, len(hyp_words) + 1):
21+
# If the current words are the same, no operation is needed
22+
# So we just take the previous minimum number of operations
23+
if ref_words[i - 1] == hyp_words[j - 1]:
24+
d[i, j] = d[i - 1, j - 1]
25+
else:
26+
# If the words are different, we consider three operations:
27+
# substitution, insertion, and deletion
28+
# And we take the minimum of these three possibilities
29+
substitution = d[i - 1, j - 1] + 1
30+
insertion = d[i, j - 1] + 1
31+
deletion = d[i - 1, j] + 1
32+
d[i, j] = min(substitution, insertion, deletion)
33+
# The minimum number of operations to transform the hypothesis into the reference
34+
# is in the bottom-right cell of the matrix
35+
# We divide this by the number of words in the reference to get the WER
36+
wer = d[len(ref_words), len(hyp_words)] / len(ref_words)
37+
return wer
38+
39+
40+
41+
if __name__ == "__main__":
42+
reference = "The cat is sleeping on the mat."
43+
hypothesis = "The cat is playing on mat."
44+
print(calculate_wer(reference, hypothesis))
Collapse file
+21Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
def calculate_wer(reference, hypothesis):
2+
ref_words = reference.split()
3+
hyp_words = hypothesis.split()
4+
5+
# Counting the number of substitutions, deletions, and insertions
6+
substitutions = sum(1 for ref, hyp in zip(ref_words, hyp_words) if ref != hyp)
7+
deletions = len(ref_words) - len(hyp_words)
8+
insertions = len(hyp_words) - len(ref_words)
9+
10+
# Total number of words in the reference text
11+
total_words = len(ref_words)
12+
13+
# Calculating the Word Error Rate (WER)
14+
wer = (substitutions + deletions + insertions) / total_words
15+
return wer
16+
17+
18+
if __name__ == "__main__":
19+
reference = "the cat sat on the mat"
20+
hypothesis = "the cat mat"
21+
print(calculate_wer(reference, hypothesis))
Collapse file
+9Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
import evaluate
2+
3+
wer = evaluate.load("wer")
4+
5+
# reference = "the cat sat on the mat"
6+
# hypothesis = "the cat mat"
7+
reference = "The cat is sleeping on the mat."
8+
hypothesis = "The cat is playing on mat."
9+
print(wer.compute(references=[reference], predictions=[hypothesis]))
Collapse file
+8Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
from jiwer import wer
2+
3+
if __name__ == "__main__":
4+
# reference = "the cat sat on the mat"
5+
# hypothesis = "the cat mat"
6+
reference = "The cat is sleeping on the mat."
7+
hypothesis = "The cat is playing on mat."
8+
print(wer(reference, hypothesis))

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.