os.walk without digging into directories below

Question

How do I limit os.walk to only return files in the directory I provide it?

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        for f in files:
            if os.path.splitext(f)[1] in whitelist:
                outputList.append(os.path.join(root, f))
            else:
                self._email_to_("ignore")
    return outputList

Another case where the multitude of possible approaches and all the caveats that go with them suggests that this functionality should be added to the Python standard library. — antred, Commented Oct 31, 2016 at 19:26
files_with_full_path = [f.path for f in os.scandir(dir) if f.is_file()]. In case you need only the filenames use f.name instead of f.path. This is the fastest solution and much faster than any walk or listdir, see stackoverflow.com/a/40347279/2441026. — user136036, Commented Jan 24, 2020 at 13:08

Yuval Adam · Accepted Answer · 2008-10-23 10:30:34Z

249

Don't use os.walk.

Example:

import os

root = "C:\\"
for item in os.listdir(root):
    if os.path.isfile(os.path.join(root, item)):
        print item

edited Oct 23, 2008 at 10:30

answered Oct 23, 2008 at 10:15

Yuval Adam

166k9595 gold badges318318 silver badges407407 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

user1329187 Over a year ago

@576i: this does not differentiate between files and directories

Daniel F Over a year ago

@Alexandr os.path.isfile and os.path.isdir lets you differentiate. I don't get it, since os.path.isfile is in the sample code since '08 and your comment is from '16. This is clearly the better answer, as you're not intending to walk a directory, but to list it.

user1329187 Over a year ago

@DanielF, what I meant here is that you need to loop over all items, while walk gives you immediately the separate lists of dirs and files.

Daniel F Over a year ago

Ah, ok. Actually Alex's answer seems to be better (using .next()) and it's much closer to your idea.

ascripter Over a year ago

Python 3.5 has a os.scandir function which allows more sophisticated file-or-directory-object interaction. See my answer below

nosklo · Accepted Answer · 2010-08-19 19:40:02Z

122

Use the walklevel function.

import os

def walklevel(some_dir, level=1):
    some_dir = some_dir.rstrip(os.path.sep)
    assert os.path.isdir(some_dir)
    num_sep = some_dir.count(os.path.sep)
    for root, dirs, files in os.walk(some_dir):
        yield root, dirs, files
        num_sep_this = root.count(os.path.sep)
        if num_sep + level <= num_sep_this:
            del dirs[:]

It works just like os.walk, but you can pass it a level parameter that indicates how deep the recursion will go.

edited Aug 19, 2010 at 19:40

answered Oct 24, 2008 at 16:46

nosklo

224k5858 gold badges300300 silver badges299299 bronze badges

9 Comments

safetyduck Over a year ago

Does this function actually "walk" through the whole structure and then delete the entries below a certain point? Or is something more clever going on? I'm not even sure how to check this with code. --python beginner

Daniel Svendsen Oct 7 at 7:34

num_sep is how many separators ("/" on mac, "\" on windows) path given to the function has. This tells you how "deep" you are in your computer, e.g. "/Users/Pedro/Desktop" would be level 3. The code keeps track of how deep it is looking and when it gets to deep it does del dirs[:]. The dirs variable points to a place in RAM where the path of the subdirectories (deeper) are stored. By running del, you just remove those strings from RAM, not the files themselves. Since os.walk uses those strings to walk deeper in the next iteration, the loop stops.

nosklo Over a year ago

@mathtick: when some directory on or below the desired level is found, all of its subdirs are removed from the list of subdirs to search next. So they won't be "walked".

Zach Young Over a year ago

I just +1'd this because I was struggling with how to "delete" dirs. I had tried dirs = [] and dirs = None but those didn't work. map(dirs.remove, dirs) worked, but with some unwanted '[None]' messages printed. So, why del dirs[:] specifically?

Daniel Svendsen Oct 7 at 7:39

The dirs variable is a list with elements, and those elements point to a place in RAM where the path of the subdirectories are stored. By running dirs = [] you are overwriting dirs which is now a list with no elements, but the above mentioned places in RAM are not removed. Running del dirs[:] does this. It must be the case that os.walk stores the location of the those places in RAM when it is looking for the next place to go in its iteration.

dthor Over a year ago

Note that this doesn't work when using topdown=False in os.walk. See the 4th paragraph in the docs:

Modifying dirnames when topdown is False has no effect on the behavior of the walk, because in bottom-up mode the directories in dirnames are generated before dirpath itself is generated.

nosklo Over a year ago

@ZacharyYoung dirs = [] and dirs = None won't work because they just create a new unrelated object and assign to the name dirs. The original list object needs to be modified in-place, not the name dirs.

|

Pieter · Accepted Answer · 2023-04-25 07:57:52Z

74

I think the solution is actually very simple.

use

break

to only do the first iteration of the for loop, there must be a more elegant way.

for root, dirs, files in os.walk(dir_name):
    for f in files:
        ...
        ...
    break
...

The first time you call os.walk, it returns tuples for the current directory, then on the next loop the contents of the next directory.

Take the original script and just add a break.

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        for f in files:
            if os.path.splitext(f)[1] in whitelist:
                outputList.append(os.path.join(root, f))
            else:
                self._email_to_("ignore")
        break
    return outputList

edited Apr 25, 2023 at 7:57

answered Jan 1, 2014 at 12:44

Pieter

2,1921919 silver badges2020 bronze badges

3 Comments

Alecz Over a year ago

This should have been the accepted answer. Simply adding a "break" after the "for f in files" loop stops the recursiveness. You might also want to make sure that topdown=True.

Steven Marsh Over a year ago

I just want to add this comment and say thank you for saving me time at work for giving such a good simplistic answer.

Tomsim Over a year ago

same here. It's simple and imho straight forward. I'm just wondering if this behavior is in the function specification.

CervEd · Accepted Answer · 2019-03-01 12:57:10Z

28

The suggestion to use listdir is a good one. The direct answer to your question in Python 2 is root, dirs, files = os.walk(dir_name).next().

The equivalent Python 3 syntax is root, dirs, files = next(os.walk(dir_name))

edited Mar 1, 2019 at 12:57

CervEd

4,53011 gold badge3939 silver badges3535 bronze badges

answered Oct 23, 2008 at 10:46

Alex Coventry

71.2k55 gold badges4040 silver badges4040 bronze badges

4 Comments

Setori Over a year ago

Oh i was getting all sort of funny error from that one. ValueError: too many values to unpack

Daniel F Over a year ago

Nice! Feels like a hack, though. Like when you turn on an engine but only let it do one revolution and then pull the key to let it die.

Evan Over a year ago

Stumbled across this; root, dirs, files = os.walk(dir_name).next() gives me AttributeError: 'generator' object has no attribute 'next'

CervEd Over a year ago

@Evan, probably because this is from 2008 and uses Python 2 syntax. In Python 3 you can write root, dirs, files = next(os.walk(dir_name)) and then the variables root, dirs, files will only correspond to the variables of the generator at the dir_name level.

Greg Hewgill · Accepted Answer · 2008-10-23 10:06:02Z

15

You could use os.listdir() which returns a list of names (for both files and directories) in a given directory. If you need to distinguish between files and directories, call os.stat() on each name.

answered Oct 23, 2008 at 10:06

Greg Hewgill

1.0m192192 gold badges1.2k1.2k silver badges1.3k1.3k bronze badges

1 Comment

Iuri Guilherme Over a year ago

or os.path.isdir: [d for d in os.listdir(path) if os.path.isdir(d)]

martineau · Accepted Answer · 2011-10-29 09:59:01Z

10

If you have more complex requirements than just the top directory (eg ignore VCS dirs etc), you can also modify the list of directories to prevent os.walk recursing through them.

ie:

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        dirs[:] = [d for d in dirs if is_good(d)]
        for f in files:
            do_stuff()

Note - be careful to mutate the list, rather than just rebind it. Obviously os.walk doesn't know about the external rebinding.

edited Oct 29, 2011 at 9:59

martineau

124k2929 gold badges181181 silver badges319319 bronze badges

answered Oct 23, 2008 at 10:49

Brian

120k2929 gold badges111111 silver badges114114 bronze badges

Comments

masterxilo · Accepted Answer · 2016-05-03 15:43:13Z

8

for path, dirs, files in os.walk('.'):
    print path, dirs, files
    del dirs[:] # go only one level deep

answered May 3, 2016 at 15:43

masterxilo

2,81633 gold badges3636 silver badges3737 bronze badges

Comments

ascripter · Accepted Answer · 2019-05-27 12:26:03Z

Since Python 3.5 you can use os.scandir instead of os.listdir. Instead of strings you get an iterator of DirEntry objects in return. From the docs:

Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because DirEntry objects expose this information if the operating system provides it when scanning a directory. All DirEntry methods may perform a system call, but is_dir() and is_file() usually only require a system call for symbolic links; DirEntry.stat() always requires a system call on Unix but only requires one for symbolic links on Windows.

You can access the name of the object via DirEntry.name which is then equivalent to the output of os.listdir

Matt R · Accepted Answer · 2022-09-08 08:47:46Z

5

Felt like throwing my 2 pence in.

baselevel = len(rootdir.split(os.path.sep))
for subdirs, dirs, files in os.walk(rootdir):
    curlevel = len(subdirs.split(os.path.sep))
    if curlevel <= baselevel + 1:
        [do stuff]

edited Sep 8, 2022 at 8:47

answered Jun 2, 2017 at 8:14

Matt R

17033 silver badges88 bronze badges

1 Comment

pauljohn32 Over a year ago

Helpful, except "\\" assumes Windoze OS. Use os.path.sep

Dmitriy Simushev · Accepted Answer · 2015-10-22 12:32:52Z

4

The same idea with listdir, but shorter:

[f for f in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, f))]

edited Oct 22, 2015 at 12:32

Dmitriy Simushev

30811 silver badge1616 bronze badges

answered Jun 25, 2014 at 20:38

Oleg Gryb

5,30911 gold badge3535 silver badges4646 bronze badges

Comments

Pedro J. Sola · Accepted Answer · 2019-06-05 15:31:04Z

root folder changes for every directory os.walk finds. I solver that checking if root == directory

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        if root == dir_name: #This only meet parent folder
            for f in files:
                if os.path.splitext(f)[1] in whitelist:
                    outputList.append(os.path.join(root, f))
                else:
                    self._email_to_("ignore")
    return outputList

Diana G · Accepted Answer · 2012-10-18 23:15:07Z

2

You could also do the following:

for path, subdirs, files in os.walk(dir_name):
    for name in files:
        if path == ".": #this will filter the files in the current directory
             #code here

answered Oct 18, 2012 at 23:15

Diana G

14922 silver badges1111 bronze badges

1 Comment

Pieter Over a year ago

Won't this loop through all sub-dir's and files unnecessarily ?

Jay Sheth · Accepted Answer · 2016-04-01 14:13:41Z

2

In Python 3, I was able to do this:

import os
dir = "/path/to/files/"

#List all files immediately under this folder:
print ( next( os.walk(dir) )[2] )

#List all folders immediately under this folder:
print ( next( os.walk(dir) )[1] )

answered Apr 1, 2016 at 14:13

Jay Sheth

1,8181717 silver badges1515 bronze badges

1 Comment

user1329187 Over a year ago

This also works for Python 2. How to get the second level?

PiMathCLanguage · Accepted Answer · 2018-11-29 21:18:20Z

1

Why not simply use a range and os.walk combined with the zip? Is not the best solution, but would work too.

For example like this:

# your part before
for count, (root, dirs, files) in zip(range(0, 1), os.walk(dir_name)):
    # logic stuff
# your later part

Works for me on python 3.

Also: A break is simpler too btw. (Look at the answer from @Pieter)

answered Nov 29, 2018 at 21:18

PiMathCLanguage

37544 silver badges1717 bronze badges

Comments

Rich · Accepted Answer · 2019-09-30 17:09:40Z

1

import os

def listFiles(self, dir_name):
    names = []
    for root, directory, files in os.walk(dir_name):
        if root == dir_name:
            for name in files:
                names.append(name)
    return names

answered Sep 30, 2019 at 17:09

Rich

1111 bronze badge

1 Comment

kenny_k Over a year ago

Hi Rich, welcome to Stack Overflow! Thank you for this code snippet, which might provide some limited short-term help. A proper explanation would greatly improve its long-term value by showing why this is a good solution to the problem, and would make it more useful to future readers with other, similar questions. Please edit your answer to add some explanation, including the assumptions you've made.

Deifyed · Accepted Answer · 2015-01-06 17:59:14Z

0

This is how I solved it

if recursive:
    items = os.walk(target_directory)
else:
    items = [next(os.walk(target_directory))]

...

edited Jan 6, 2015 at 17:59

answered Jan 6, 2015 at 17:47

Deifyed

6111 silver badge44 bronze badges

Comments

Kemin Zhou · Accepted Answer · 2015-09-23 18:42:26Z

0

There is a catch when using listdir. The os.path.isdir(identifier) must be an absolute path. To pick subdirectories you do:

for dirname in os.listdir(rootdir):
  if os.path.isdir(os.path.join(rootdir, dirname)):
     print("I got a subdirectory: %s" % dirname)

The alternative is to change to the directory to do the testing without the os.path.join().

answered Sep 23, 2015 at 18:42

Kemin Zhou

7,00133 gold badges5454 silver badges6060 bronze badges

Comments

alexandre-rousseau · Accepted Answer · 2016-08-24 08:56:51Z

0

You can use this snippet

for root, dirs, files in os.walk(directory):
    if level > 0:
        # do some stuff
    else:
        break
    level-=1

answered Aug 24, 2016 at 8:56

alexandre-rousseau

3,04511 gold badge3333 silver badges3535 bronze badges

Comments

Hamsavardhini · Accepted Answer · 2017-11-21 09:49:23Z

create a list of excludes, use fnmatch to skip the directory structure and do the process

excludes= ['a\*\b', 'c\d\e']
for root, directories, files in os.walk('Start_Folder'):
    if not any(fnmatch.fnmatch(nf_root, pattern) for pattern in excludes):
        for root, directories, files in os.walk(nf_root):
            ....
            do the process
            ....

same as for 'includes':

if **any**(fnmatch.fnmatch(nf_root, pattern) for pattern in **includes**):

Oleg · Accepted Answer · 2019-01-30 13:59:55Z

0

A slight change to Alex's answer, but using __next__():

print(next(os.walk('d:/'))[2]) or print(os.walk('d:/').__next__()[2])

with the [2] being the file in root, dirs, file mentioned in other answers

answered Jan 30, 2019 at 13:59

Oleg

30322 silver badges1515 bronze badges

Comments

Alon Barad · Accepted Answer · 2020-12-23 07:44:32Z

This is a nice python example

def walk_with_depth(root_path, depth):
        if depth < 0:
            for root, dirs, files in os.walk(root_path):
                yield [root, dirs[:], files]

            return

        elif depth == 0:
            return

        base_depth = root_path.rstrip(os.path.sep).count(os.path.sep)
        for root, dirs, files in os.walk(root_path):
            yield [root, dirs[:], files]

            cur_depth = root.count(os.path.sep)
            
            if base_depth + depth <= cur_depth:
                del dirs[:]

Collectives™ on Stack Overflow

os.walk without digging into directories below

21 Answers 21

5 Comments

9 Comments

3 Comments

4 Comments

1 Comment

Comments

Comments

1 Comment

1 Comment

Comments

Comments

1 Comment

1 Comment

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

21 Answers 21

5 Comments

9 Comments

3 Comments

4 Comments

1 Comment

Comments

Comments

1 Comment

1 Comment

Comments

Comments

1 Comment

1 Comment

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

1 Comment

Linked

Related