10

I recently edited an existing document to create a new one from it (that is: I copied the whole folder to a new location and started from there). The early document had a lot of figures, but not all of them were used in the new version.

Now I have a lot of unused files (jpg, pdf, png) under the /fig which I want to get rid of, because they are not called by any \includegraphics command.

Is there a way to list used or unused files? (I'm not referring to auxiliary files, I'm fine with those.)

6 Answers6

9

I came up with this little script (ran from the root folder of the project):

#!/bin/bash

for image_file in $(ls fig/)
do
if grep $image_file *.log -c > 1
then
        echo "File $image_file is in use."
else
        echo "File $image_file is not in use."
        mv "fig/$image_file" "fig/moved.$image_file" # or any other action
fi
done
  • 2
    This solution worked well for me but first I had to update some settings in texmf.cnf to prevent line breaks - otherwise some image files (with long names) were not found in the log file because their name spanned 2 lines (See this SE question) – Stefan Avey Jan 06 '17 at 22:05
  • Is there an easy fix to the newline problem on the level of the bash script? – Marten Apr 19 '21 at 13:02
  • this seems to work for me. As some long file names are have line breaks, I just got rid of all the line breaks.
    `tr -d '\n' < myfile.log | sed -E 's/\(/\n(/g' > better_log.txt`
    
    

    the tr command removes all the newlines and the sed command adds newlines just before opening-parentheses characters. You might not really need to add any newlines back in. Now, just use "better_log.txt" in place of the original log file in whatever script.

    – troy Jan 24 '24 at 01:16
3

In case someone is still looking, I have made a Python 3 script to deal with this problem. I use it to generate a new clean LaTex folder with all the used files directly at the root of the folder, instead of spread in multiple subdirectories. This is a requirement for preprint servers like arXiv and HAL.

(If you only want to delete unused files, then simply use the content of the newly created clean folder)

The script takes as input:

  • a list of TeX file to parse (in case you split your documents in multiple files, located in the same folder)
  • a list of file extensions of the potentially unused files we wish to look for
  • some other self-explanatory options

The script looks in the specified TeX files for all occurrences of the specified extension and builds a list of all used files with this extension. All these files are copied over to a new specified folder. Other files found at the root of the TeX folder are also copied for convenience (except TeX compilation files, and the previous unused files). The provided TeX files are copied over as well, but all their references to the files are changed so that they point directly to the new files at the root of the new folder.

That way, you directly obtain a compilation-ready LaTex folder with all the files you need.

Here is the code:

import os, sys, shutil
import re
import ntpath

############ INPUTS ###############

list of Tex files to parse

(they should all be within the same folder, as the image paths

are computed relative to the first TeX file)

texPathList = ["/home/my/tex/folder/my_first_file.tex", "/home/my/tex/folder/my_second_file.tex"]

extensions to search

extensions=[".png", ".jpg", ".jpeg", ".pdf", ".eps"]

bExcludeComments = True # if True, files appearing in comments will not be kept

path where all used images and the modified TeX files should be copied

(you can then copy over missing files, e.g. other types of images, Bib files...)

location of the new folder (should not exist already)

exportFolder = '/home/my/new/folder/clean_article/'

should all other files in the root folder (not in subfolders) be copied ?

(temporary TeX compilation files are not copied)

bCopyOtherRootFiles = True

############## CREATE CLEAN FOLDER #################

1 - load TeX files

text='' for path in texPathList: with open(path,'r') as f: text = text + f.read()

2 - find all occurrences of the extension

global_matches = [] for extension in extensions: escaped_extension = '\'+extension # so that the point is correctly accounted for pattern=r'{[^}]+'+escaped_extension+'}' if not bExcludeComments: # simply find all occurrences matches = re.findall(pattern=pattern, string=text) # does not give the position else: # more involved search # 2.1 - find all matches positions, matches = [], [] regex = re.compile(pattern) for m in regex.finditer(text): print(m.start(), m.group()) positions.append( m.start() ) matches.append( m.group()) # 2.2 - remove matches which appear in a commented line # parse list in reverse order and remove if necessary for i in range(len(matches)-1,-1,-1): # look backwards in text for the first occurrence of '\n' or '%' startPosition = positions[i] while True: if text[startPosition]=='%': # the line is commented print('file "{}" is commented (discarded)'.format(matches[i])) positions.pop(i) matches.pop(i) break if text[startPosition]=='\n': # the line is not commented --> we keep it break startPosition -= 1 global_matches = global_matches + matches

3 - make sure there are no duplicates

fileList = set(global_matches) if len(global_matches) != len(fileList): print('WARNING: it seems you have duplicate images in your TeX')

3.1 - remove curly braces

fileList = [m[1:-1] for m in fileList]

4 - copy the used images to the designated new location

try: os.makedirs(exportFolder) except FileExistsError: raise Exception('The new folder already exists, please delete it first')

texRoot = os.path.dirname(texPathList[0]) for m in fileList: absolutePath = os.path.join(texRoot, m) shutil.copy(absolutePath, exportFolder)

5 - copy the TeX files also, and modify the image paths they refer to

for path in texPathList: with open(path,'r') as f: text = f.read() for m in fileList: text = text.replace(m, ntpath.basename(m) ) newPath = os.path.join(exportFolder, ntpath.basename(path)) with open(newPath, 'w') as f: f.write(text)

6 - if chosen, copy over all the other files (except TeX temp files)

which are directly at the root of the original TeX folder

if bCopyOtherRootFiles: excludedExtensions = ['.aux', '.bak', '.blg', '.bbl', '.spl', '.gz', '.out', '.log'] for filename in os.listdir(texRoot): fullPath = os.path.join(texRoot, filename) if os.path.isfile(fullPath): ext = os.path.splitext(filename)[1] # do not copy already modified TeX files if not ( filename in [ntpath.basename(tex) for tex in texPathList]): # do not copy temporary files if not ( ext.lower() in excludedExtensions ): # do not copy files we have already taken care of if not ( ext.lower() in extensions ): shutil.copy( fullPath, exportFolder)

The export folder now contains the modified TeX files and all the required files !

Laurent90
  • 161
3

I wrote about it here medium.com/@weslley.spereira/remove-unused-files-from-your-latex-project. In a few words, I generalized a bit Alessandro Cuttin's script to encompass more directory levels. I hope it still helps.

nonUsed="./nonUsedFiles"
mkdir -p "$nonUsed"

Directory Level 1

for imgFolder in $(ls -d "$projectFolder"/*/); do echo "$imgFolder" for imageFile in $(ls "$imgFolder"); do

echo "$imageFile"

    if grep &quot;$imageFile&quot; &quot;$projectFolder/$mainfilename.log&quot; -c &gt; 1; then
        echo &quot;+ File $imageFile is in use.&quot;
    else
        echo &quot;- File $imageFile is not in use.&quot;
        mkdir -p $nonUsed&quot;/&quot;$imgFolder
        mv &quot;$imgFolder/$imageFile&quot; &quot;$nonUsed/$imgFolder$imageFile&quot;
    fi
done

done

Directory Level 2

for imgFolder in $(ls -d "$projectFolder"///); do echo "$imgFolder" for imageFile in $(ls "$imgFolder"); do

echo "$imageFile"

    if grep &quot;$imageFile&quot; &quot;$projectFolder/$mainfilename.log&quot; -c &gt; 1; then
        echo &quot;+ File $imageFile is in use.&quot;
    else
        echo &quot;- File $imageFile is not in use.&quot;
        mkdir -p $nonUsed&quot;/&quot;$imgFolder
        mv &quot;$imgFolder/$imageFile&quot; &quot;$nonUsed/$imgFolder$imageFile&quot;
    fi
done

done

1

I'm not sure about your question. If you like to clean up a directory and get rid of auxiliary files and let's say all files *.jpg, and you are under Windows, you could use a powershell script published by U. Ziegenhagen here: http://uweziegenhagen.de/?p=2095. Customise it, put it into your folder and press shift + rightclick. Beware: it deletes in a second...

My adaption includes files produced by tex4ht and syntex:

function Get-ScriptDirectory{
    $Invocation = (Get-Variable MyInvocation -Scope 1).Value
    Split-Path $Invocation.MyCommand.Path
}

$path = (Get-ScriptDirectory)

cd $path


remove-item  *.log |% {remove-item $_}

get-childitem *.toc |% {remove-item $_}

get-childitem *.gz |% {remove-item $_}

get-childitem *.aux |% {remove-item $_}

get-childitem *.nav |% {remove-item $_}

get-childitem *.out |% {remove-item $_}

get-childitem *.synctex |% {remove-item $_}

get-childitem *.synctex.gz |% {remove-item $_}

get-childitem *.tmp |% {remove-item $_}

get-childitem *.4ct |% {remove-item $_}

get-childitem *.4tc |% {remove-item $_}

get-childitem *.anl |% {remove-item $_}

get-childitem *.lg |% {remove-item $_}

get-childitem *.idv |% {remove-item $_}

get-childitem *.xref |% {remove-item $_}
Keks Dose
  • 30,892
  • Actually, I don't want to delete all .jpg files, but only those that are no longer used by a new document (that is, there is no \includegraphics calling them). – Alessandro Cuttin Aug 09 '14 at 11:07
1

A bit Late to the party, but here is another Python approach.

Workflow for this CLI application:

  • call python3 move_unused_figures_from_latex_project.py from latex project root directory
  • indicate in which folder the figures are located
  • specify a folder name to place unused figures in
  • specify per image if you really want to move it

Note: To check if a figure is used in the latex project a function string_found_in_tex_files is called, it checks if there is any occurrence of the file path in a .tex file in the project. As a result, commented figures are not moved, just as any occurrence of the file path that is unrelated to the figure.

move_unused_figures_from_latex_project.py:

import shutil
import os

NOTE: main function is down below

def ask_existing_folder_name(default_folder_name=None) -> str: """ ask user for a folder name. """

if default_folder_name is None:
    folder_name = input('Please enter an existing directory name:')
else:
    folder_name = input(
            f'Please enter an existing directory (default = {default_folder_name}):')

    if folder_name in ['', 'y', 'Y']:
        folder_name = default_folder_name

if os.path.isdir(folder_name):
    return folder_name

print('That folder does not exist!')
return ask_existing_folder_name(default_folder_name=default_folder_name)


def ask_new_folder_name(default_folder_name=None) -> str: """ ask user to input a new folder name. """ if default_folder_name is None: folder_name = input('Please enter a new directory name:') else: folder_name = input(f'Please enter a new directory name (default = {default_folder_name}):')

    if folder_name in ['', 'y', 'Y']:
        folder_name = default_folder_name

if not os.path.isdir(folder_name):
    return folder_name

print('That folder does already exist!')
return ask_new_folder_name(default_folder_name=default_folder_name)

def string_found_in_tex_files(string_to_search: str) -> bool: """ return True if there exist a .tex file in the current directory or any subdirectory that contains string_to_search. """ print(f"search string {string_to_search}") for root, _, files in os.walk("."): for filename in files: filepath = os.path.join(root, filename) if filepath.endswith('.tex') and os.path.isfile(filepath): with open(filepath) as file: if string_to_search in file.read(): return True return False

def main(): """ interactive CLI that moves unused figures from latex project. """

print(&quot;welcome, we're going to remove all unused figures from this latex project&quot;)
print('NOTE: make sure to run this function from latex project root\n')

figures_folder_name = ask_existing_folder_name(default_folder_name='figures/')

print('unused figures are moved to a new directory')
unused_figures_folder_name = ask_new_folder_name(default_folder_name='unused_figures/')
os.mkdir(unused_figures_folder_name)

figure_file_paths = []

extensions = (&quot;.pdf&quot;, &quot;.jpg&quot;, &quot;.png&quot;, &quot;.eps&quot;)

# collect all relative paths to figures
for root, _, files in os.walk(figures_folder_name):
    for filename in files:
        if filename.endswith(extensions):
            file_path = os.path.join(root, filename)
            figure_file_paths.append(file_path)

only_used_figures_detected = True
for file_path in figure_file_paths:

    # take away the extension
    (file_path_without_extension, _) = os.path.splitext(file_path)

    if not string_found_in_tex_files(file_path_without_extension):

        only_used_figures_detected = False

        answer = input(f'{file_path} is unused,'\
                f'do you want to move it to {unused_figures_folder_name} (Y/n)?')
        if answer in ['n', 'N', 'no']:
            continue

        # move the file
        shutil.move(file_path, unused_figures_folder_name)
        print(f'{file_path} moved to {unused_figures_folder_name}')


if only_used_figures_detected:
    print('all figures are used :)')

if name == 'main': main()

0

Check out my typical Makefile

# This is a LaTeX Makefile created by Predrag Punosevac#
########################################################
SHELL = /bin/sh
.SUFFIXES : .tex .dvi .ps .pdf

FILE = sam-new

LATEX = /usr/local/bin/latex
PDFLATEX = /usr/local/bin/pdflatex
BIBTEX = /usr/local/bin/bibtex
XDVI = /usr/local/bin/xdvi
DVIPS = /usr/local/bin/dvips
GVU = /usr/local/bin/gvu
PS2PDF = /usr/local/bin/ps2pdf
XPDF = /usr/local/bin/xpdf 
LPR = /usr/bin/lpr

DVI = ${FILE}.dvi
PS = ${FILE}.ps
PDF = ${FILE}.pdf



.tex.pdf :
       ${PDFLATEX} ${FILE}.tex
       ${PDFLATEX} ${FILE}.tex



bib :   
       ${PDFLATEX} ${FILE}.tex
       ${BIBTEX} ${FILE}
pdf : bib
       ${PDFLATEX} ${FILE}.tex
       ${PDFLATEX} ${FILE}.tex



# Various cleaning options
clean-ps :
          /bin/rm -f  *.log *.aux *.dvi *.bbl *.blg *.bm *.toc *.out \
          *Notes.bib *.ps

I typically call

  make pdf clean-ps

using keybindings from nvi.