Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

@Thomas-Daniel
Copy link

No description provided.

syms = ['M', 'CM', 'D', 'CD', 'C', 'XC', 'L', 'XL', 'X', 'IX', 'V', 'IV', 'I']
roman = ''
i = 0
while n > 0:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will get stuck in an infinite loop if we pass 0, which we can't exclude from happening.
Should rewrite without while-loop, something like this (not tested):

for v, s in zip(val, syms):
    count = n // v
    roman += s * count
    n -= v * count


def get_style_num_map(styles_root, numId_to_abstract, abstract_to_levels, num_root):
style_num_map = {}
for style in styles_root.findall('w:style', NS):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are two sources for the style_num_map, the non-abstract and the abstract ones, each being handled by a different for-loop.
Are they independent from one another? Does it makes sense to put them in the same map?
Also, intuitively, I would create two separate functions (or sub-functions, depending on whether we want them in the same map or not)

return style_num_map

def extract_paragraph_text(para):
return ''.join([node.text or '' for node in para.findall('.//w:t', NS)]).strip()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not work for paragraphs that contain track changes (w:ins and w:del), to check.
In that case, we probably want to pass a parameter to decide whether we want the version pre or post changes, with post changes by default I guess.

continue
style_id = style.attrib.get(f'{{{NS["w"]}}}styleId')
pPr = style.find('w:pPr', NS)
if pPr is not None:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be if pPr: for a more Pythonic way to write it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xml.etree.ElementTree: Testing the truth value of an Element is deprecated. In a future release it will always return True. Prefer explicit len(elem) or elem is not None tests instead. zipimport.zipimporter.load_module() is deprecated: use exec_module() instead.

return style_num_map

def extract_paragraph_text(para):
return ''.join([node.text or '' for node in para.findall('.//w:t', NS)]).strip()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you can remove the [ ... ] to make it a generator comprehension and avoid creating a list in memory.

Suggested change
return ''.join([node.text or '' for node in para.findall('.//w:t', NS)]).strip()
return ''.join(node.text or '' for node in para.findall('.//w:t', NS)).strip()

return ''.join([node.text or '' for node in para.findall('.//w:t', NS)]).strip()

def get_list_number(para, style_num_map, counters, numId_to_abstract, abstract_to_levels):
pPr = para.find('w:pPr', NS)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good argument to integrate that code into the parsing done by python-docx would be that it will parse such info only once, such that we can simply do para.pPr afterwards whenever we need it, and avoid losing performance running .find(...) each time.

if counters[numId][ilvl] == 0:
counters[numId][ilvl] = level_def.get('start', 1)
else:
counters[numId][ilvl] += 1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the function is called get_list_number, I would expect it to return something but not modify its parameters, like counters here.
I would expect the counters mapping to be already constructed already. Here this looks like it create inconsistent numbering if we call get_list_number twice for the same paragraph.


lvlText = level_def.get('lvlText', '')
fmt_values = {}
for level in range(9):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid any confusion regarding to fact that "level" starts at 1 and not 0, I would initialize counters as defaultdict(lambda: defaultdict(lambda: 1)) and initialize level here at 1:

                for level in range(1, 10):

and remove the + 1s afterwards.

Maybe applies to ilvl as well?

And by the way, if 10 the maximum level that can possibly be defined, per the specs?

for level in range(9):
if f'%{level + 1}' in lvlText:
if counters[numId][level] == 0:
level_start = abstract_to_levels[abstractId].get(level, {}).get('start', 1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is a bit overkill, but we could initialize abstract_to_levels sub-dict with {'start': 1} by default, such that we don't need the double .get

numbering_str = get_list_number(para, style_num_map, counters, numId_to_abstract, abstract_to_levels)
text = extract_paragraph_text(para)
if text:
lines.append(f'{numbering_str}{text}')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a space in between? Or maybe f'{numbering_str} {text.lstrip()}' to have one space at most.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.