{"id":399,"date":"2024-03-18T23:22:49","date_gmt":"2024-03-18T23:22:49","guid":{"rendered":"https:\/\/offensivepython.com\/?p=399"},"modified":"2024-03-19T21:26:20","modified_gmt":"2024-03-20T01:26:20","slug":"searching-files-recursively","status":"publish","type":"post","link":"https:\/\/offensivepython.com\/index.php\/2024\/03\/18\/searching-files-recursively\/","title":{"rendered":"Searching Files Recursively"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"399\" class=\"elementor elementor-399\">\n\t\t\t\t<div class=\"elementor-element elementor-element-509f3faa e-flex e-con-boxed e-con e-parent\" data-id=\"509f3faa\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t\t\t<div class=\"elementor-element elementor-element-6acc6a14 elementor-widget elementor-widget-text-editor\" data-id=\"6acc6a14\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>\u00a0 \u00a0 \u00a0I would typically NOT use Python for doing recursive grep operations, because it is going to be SUPER slow compared to using <a href=\"https:\/\/github.com\/BurntSushi\/ripgrep\">ripgrep<\/a> in Linux, or <strong>findstr<\/strong> in Windows.\u00a0 I have various shell scripts that use ripgrep at their core, and it is blazing fast.\u00a0 The built-in findstr command in Windows is not bad either.\u00a0\u00a0<\/p>\n<p>\u00a0 \u00a0 \u00a0Nonetheless, for small tasks where monster shares are not being searched, a quick Python solution may come in handy in certain situations, and below is how it can be done.\u00a0 The goal here is to recursively search for a string of text and return each filepath that has &#8220;hits&#8221;.\u00a0\u00a0<\/p>\n<div>\n<p><strong>from pathlib import Path<\/strong><br \/><strong>search_term = &#8220;password&#8221;<\/strong><br \/><strong>start_point = Path(&#8220;\/tmp&#8221;)<\/strong><br \/><strong>for file in start_point.rglob(&#8220;*&#8221;):<\/strong><br \/><strong>\u00a0 \u00a0 if not file.is_file():<\/strong><br \/><strong>\u00a0 \u00a0 \u00a0 \u00a0 continue<\/strong><br \/><strong>\u00a0 \u00a0 file_content = file.read_bytes()<\/strong><br \/><strong>\u00a0 \u00a0 if search_term.encode() in file_content:<\/strong><br \/><strong>\u00a0 \u00a0 \u00a0 \u00a0 print(str(file))<\/strong><\/p>\n<\/div>\n<p>\u00a0 \u00a0 \u00a0Output from my Windows computer is included below.\u00a0 I only staged one test file there with the string password in it, but look how many files it found.\u00a0 I forgot that I had left so much in that folder.<\/p>\n<p>\\tmp\\test1.txt<br \/>\\tmp\\tmp\\c2-agent.c<br \/>\\tmp\\tmp\\client.py<br \/>\\tmp\\tmp\\server.py<br \/>\\tmp\\pyinstaller-develop\\bootloader\\Vagrantfile<br \/>\\tmp\\pyinstaller-develop\\doc\\bootloader-building.rst<br \/>\\tmp\\pyinstaller-develop\\.github\\ISSUE_TEMPLATE\\antivirus.md<br \/>\\tmp\\pyinstaller-develop\\tests\\functional\\test_django.py<br \/>\\tmp\\pyinstaller-develop\\tests\\functional\\test_libraries.py<br \/>\\tmp\\pyinstaller-develop\\tests\\functional\\data\\django\\db.sqlite3<br \/>\\tmp\\pyinstaller-develop\\tests\\functional\\data\\django\\django_site\\settings.py<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>\u00a0 \u00a0 \u00a0I would typically NOT use Python for doing recursive grep operations, because it is going to be SUPER slow compared to using ripgrep in Linux, or findstr in Windows.\u00a0 I have various shell scripts that use ripgrep at their core, and it is blazing fast.\u00a0 The built-in findstr command in Windows is not [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-container-style":"default","site-container-layout":"default","site-sidebar-layout":"default","site-transparent-header":"default","disable-article-header":"default","disable-site-header":"default","disable-site-footer":"default","disable-content-area-spacing":"default","footnotes":""},"categories":[32],"tags":[33,34],"class_list":["post-399","post","type-post","status-publish","format-standard","hentry","category-file-operations","tag-pathlib","tag-rglob"],"_links":{"self":[{"href":"https:\/\/offensivepython.com\/index.php\/wp-json\/wp\/v2\/posts\/399","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/offensivepython.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/offensivepython.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/offensivepython.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/offensivepython.com\/index.php\/wp-json\/wp\/v2\/comments?post=399"}],"version-history":[{"count":13,"href":"https:\/\/offensivepython.com\/index.php\/wp-json\/wp\/v2\/posts\/399\/revisions"}],"predecessor-version":[{"id":418,"href":"https:\/\/offensivepython.com\/index.php\/wp-json\/wp\/v2\/posts\/399\/revisions\/418"}],"wp:attachment":[{"href":"https:\/\/offensivepython.com\/index.php\/wp-json\/wp\/v2\/media?parent=399"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/offensivepython.com\/index.php\/wp-json\/wp\/v2\/categories?post=399"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/offensivepython.com\/index.php\/wp-json\/wp\/v2\/tags?post=399"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}