Compressed Zip Folder, Msoft Word, characters to avoid?

  • #1
paulb203
97
41
TL;DR Summary
Which characters to avoid when compressing a folder in MS Word
I’m trying to compressed a folder containing a combination of sub-folders and individual files. All the files are Word documents.
I’m having to rename some of the sub-folders as I’ve become aware that some characters have to be avoided. For example, MS have told me that apostrophes have to be avoided.
Does anyone here know the full list? Google, as ever, doesn’t lead me to a consensus. Example. Someone said spaces are to be avoided but I managed to compress a folder to the contrary. Another said to avoid capital letters, but that's not a problem either.

Also, is it just the folder names, and sub-folder names, or does it apply to individual files too?
 
Computer science news on Phys.org
  • #2
Personally instead of learning which ones to avoid just limit yourself to A-Z, a-z, 0-9 and the character underscore "_" and the character "-". These should work across OS systems as well ie Windows, MacOS and Linux.

Why?

* and ? are used for file searching
' and " are used for quoting file names with embedded spaces for command line commands
. and / and \ and : and ; are used as file and folder separators depending on the file system and OS. Examples include /xxx/yyy/zzz.doc on MacOS or Linux and c:\xxx\yyy\zzz.doc on Windows

Other special characters may crop up in scripting like ! and & and # and @ ...
 
  • Like
  • Informative
Likes Vanadium 50, davenn, berkeman and 3 others
  • #3
+1 to jedishrfu. I would avoid use of any apostrophes or similar special characters in filenames. You never know what they'll blow up down the line.

IMHO it is worth it for you retrofit your existing files and folders with boring names and follow that policy moving forward.
 
  • Like
Likes paulb203
  • #4
paulb203 said:
Also, is it just the folder names, and sub-folder names, or does it apply to individual files too?
It applies to files as well as folders.

@jedishrfu's advice is good: avoid everything except [A-Za-z0-9_-.] (I've also included . as using this is fine, as long as you recognise its special use to indicate an extension.

If you are working on both Windows and POSIX-based file systems (MacOS, Linux...) it is also a good idea to avoid capital letters because in Linux MyFile.txt and myfile.txt are different files but in Windows they refer to the same file so this would cause problems.

Other characters like spaces are also permitted, but that doesn't mean it is a good idea to use them because they may break other things (for instance if you use spaces you might have to surround the file name with quotes sometimes).

Also note that really_long_folder_or_file_names_are_a_bad_idea, as\are\deep\levels\of\nesting\particularly_with_long_names\for_folders, because in some cases the full path is limited to 260 characters (and sometimes it isn't).
 
  • Like
Likes paulb203
  • #5
All excellent advice (this is all coming back to me now!)

I have a subsite on my webhost where my code makes no distinction between .jpg and .JPG - but my webhost does, so half my images are busted.
 
  • Like
Likes paulb203
  • #6
Thanks a lot guys.

So, in summary;

Stick to; a-z, 0-9, underscore, and dash. That's it.

Avoid everything else, including long names for files or folders, and deep nesting (which I think means folders containing sub-folders which themselves contain sub-folders (so, sub-sub folders, I guess) etc, etc?

Nb; I know not everyone said avoid uppercase and the full stop (period) but given what pbuk said, with caveats, I think it might be best for me to just avoid those too.

Q. If I've understood deep nesting correctly, is it ever problematic to have, say, a folder, containing a subfolder? A folder containing a subfolder which also contains a subfolder? At what point does it become potentially problematic?
 
  • #7
You might be interested in https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits

paulb203 said:
Q. If I've understood deep nesting correctly, is it ever problematic to have, say, a folder, containing a subfolder? A folder containing a subfolder which also contains a subfolder? At what point does it become potentially problematic?
According to the previous link, with NTFS (Windows), there is a limit for the maximum pathname length: 32,767 characters with each path component (directory or filename) up to 255 characters long.
 
  • Like
Likes paulb203
  • #8
jack action said:
You might be interested in https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits


According to the previous link, with NTFS (Windows), there is a limit for the maximum pathname length: 32,767 characters with each path component (directory or filename) up to 255 characters long.
That is what NTFS supports, yes, so you can in theory store those files on your disk, however the default in the Windows API is a TOTAL of 260 characters as I said above, so these files cannot normally be accessed by applications. Which is not much use.

1729029695089.png


https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry
 
  • Like
  • Informative
Likes paulb203, berkeman and jack action
  • #9
Concerning the dot, I've had some issues using dot in some applications that characterize the file or directory by its ending file type.

As an example, while using the Obsidian app on MacOS, I named a directory my.math.notes, and MacOS decided it was a file and not a directory, so I had to change them to my-math-notes to avoid the issue.

Also its not good to use "<", ">" or "|" characters as they are used in command-line commands for redirection ie "<" for input file redirection ie input comes from the specified file and ">" for output file redirection ie output goes to the specified file. The "|" is for chaining commands together.

"&" is yet another as it has special uses in scripting and in marking a command on unix based OSes for background execution.

Also, the "\" used in Windows for directory path separator is a bad character for other OSes as it is interpreted as an escape character and removed from the string. If you've had C programming, then recall that "\r" is the carriage return, "\n" is the newline character, "\t" is the tab character, and there are others. Most will cause havoc in path/filenames.

Anyway, that's why its best to stick with alphanumeric characters and "-" or "_" only.

As far as restricting subdirectories in zip files, I don't see the point. Zip files can preserve directory structure, allowing you to move files and directories as is, and this comes in very handy when doing backups or snapshots of your work or for sharing with others.
 
  • Like
Likes paulb203
  • #10
paulb203 said:
Q. If I've understood deep nesting correctly, is it ever problematic to have, say, a folder, containing a subfolder? A folder containing a subfolder which also contains a subfolder? At what point does it become potentially problematic?
When the total length of the path approaches 260 characters.
 
  • Like
Likes paulb203
  • #11
jedishrfu said:
As an example, while using the Obsidian app on MacOS, I named a directory my.math.notes, and MacOS decided it was a file and not a directory, so I had to change them to my-math-notes to avoid the issue.
That sounds like an Obsidian bug - the API call to create a directory is different to the call to create a file, and they have different representations on disk. At least in any sensible file system they do - I'm not sure about APFS, and although MacOS is evil and perverse I really can't see it changing a descriptor in this way.

I cannot reproduce this in Obsidian 1.6.7/Big Sur.
 
  • #12
pbuk said:
When the total length of the path approaches 260 characters.
I'm not sure what 'the path' means. I Googled it but regretted. I'm borderline tech illiterate, as you've probably gathered.
Does it mean that if you have a folder named folder_1 you've used up 8 of those 260 characters. And a subfolder named folder_2 you've used up 8 more? If so, would that mean you could have 30 or so nested folders before it became problematic?
 
  • #13
pbuk said:
When the total length of the path approaches 260 characters.

Boy, I have a story for this one. At one time, I was working on an Installshield script for our product. It was a developer kit for Windows and originally we had to add 3 directories to the system path during the install.

However, the developers in their infinite wisdom devided up the code into three separate areas (runtime, developer tools, customer examples) which meant we had to add 9 directories (bin, lib, dll) to the system path.

They were added to the beginning of the path in order to supersede any Microsoft command/dll of the same name as ours. Sadly, when adding these 9 directories each about 30+ character long, we managed to push the system dll + commands directory past the 256 character limit. The path got chopped at 256 and on reboot nothing came up except for a pretty blue screen which made you want to take a nice vacation.

This happened to me on my workstation. Fortunately, I had given a copy of my environment variables to a coworker who had to rebuild his machine a few days earlier and with those in hand I was able to get things back to normal.

I filed a report with MS on it but heard nothing from them. It surprised me that they didn't respond since we were a big user of microsoft products and this seemed like a major flaw. There was nothing in their docs about this limitation that I could find. I felt it was a holdover from DOS days and lean clean programming machines.

Curiously, the command session path allowed for 4096 characters. but we couldn't use that since much of our stuff was GUI based and launched via the desktop not in a command session.
 
  • #14
paulb203 said:
I'm not sure what 'the path' means. I Googled it but regretted. I'm borderline tech illiterate, as you've probably gathered.
Does it mean that if you have a folder named folder_1 you've used up 8 of those 260 characters. And a subfolder named folder_2 you've used up 8 more? If so, would that mean you could have 30 or so nested folders before it became problematic?
When you open a command line session aka command shell where you can type commands like dir, copy, format and others, the command shell sets up a custom environment of parameters that the commands can access to configure themselves properly when they are called to run.

One such parameter is the path parameter which provides a list of directories for the command shell to search when it doesn't understand the command the user typed in.

When I type dir on windows or ls on unix OSes, the command shell knows these are built in and are common to all users. But if I type python then the command shell searches the list of directories listed in the path parameter looking for the Python command.

You can use the echo command to view the path parameter in windows:
Code:
echo %path%     (Shows the path parameter)

c:\Program files\python3\bin;c:\windows\system32;c:\windows

Typing python, the command shell will prepare to search the three directories and find python in the first and will then execute python.exe.

The set command will display all environment parameters.

set (Shows all environment parameters)
 
  • #15
paulb203 said:
I'm not sure what 'the path' means.
https://en.wikipedia.org/wiki/Path_(computing)

paulb203 said:
Does it mean that if you have a folder named folder_1 you've used up 8 of those 260 characters.
In Windows (which is the only place it matters) it means you have used 12 C:\folder_1\.

paulb203 said:
And a subfolder named folder_2 you've used up 8 more?
9 more: folder_2\.

paulb203 said:
If so, would that mean you could have 30 or so nested folders before it became problematic?
Yes but that doesn't mean nesting folders that deep is a good idea - it would be impossible to keep track of what is where.

Also note that working close to the 260 character limit is not a good idea - let's say you want to take a backup on another disk in the folder "D:\backups\pbuk-laptop\2024-10-16T09:09:23\" - that's another 43 characters added on the front (guess how I know this).

Edit [aside]: I think I first encountered this limit whn ripping my CD collection with the (excellent) dbPoweramp on its default settings when it tried to create something like C:\Users\pbuk\Music\ripped-cds\Bach: St Matthew Passion Gardiner, Rolfe Johnson, Et Al\Gloria - Chorus: Et in terra pax - Nancy Argenta, Jane Fairfield, Jean Knibbs, Collin Patrick, Ashley Stafford, Andrew Murgatroyd, Lloyd Morgan, Stephen Varcoe, English Baroque Soloists, John Eliot Gardiner, The Monteverdi Choir.flac
 
Last edited:

Similar threads

Replies
4
Views
2K
Replies
6
Views
4K
Replies
1
Views
2K
Replies
18
Views
5K
Replies
6
Views
3K
Replies
7
Views
2K
Replies
33
Views
2K
Replies
1
Views
1K
Back
Top