Discussion:
[Interest] QDir::entry(Info)List on macos
Manner Róbert
2018-11-14 09:34:17 UTC
Permalink
Hi,

I would like to list all files and directories in a folder on macos
(High Sierra). Unfortunately everything seems to miss filenames with
accented letters, eg éáűúőóüí. If the filename contains at least one of
these, it is silently ignored from the list.

Here is a test program to reproduce:

#include <QDir>
#include <QDebug>
#include <QFileInfoList>

int main() {
  qDebug() << QDir(".").entryList();

  qDebug() << QDir(".").entryInfoList(QDir::System | QDir::AllEntries |
QDir::Hidden);
}

I have also tried to specify filters, hidden, system etc, they seem to
not help.

Do you know about this problem? Do you know a workaround? Qt version is
5.11.2

Thanks in advance,

Robert
Thiago Macieira
2018-11-14 16:34:02 UTC
Permalink
Post by Manner Róbert
int main() {
qDebug() << QDir(".").entryList();
QCoreApplication missing. Try again with it.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
Manner Róbert
2018-11-15 07:54:38 UTC
Permalink
Post by Thiago Macieira
Post by Manner Róbert
int main() {
qDebug() << QDir(".").entryList();
QCoreApplication missing. Try again with it.
Tried, without success, still does not display these files. Even tried
with QDirIterator, that is also working the same (skipping these files).

With further checking I noticed that the files are only not displaying
if I do not create them with Qt. Eg I created with "touch filenamé". If
I create the same file with QFile, that seems to be found by these dir
lists. I know it sounds insane.
Post by Thiago Macieira
locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"
Post by Thiago Macieira
cat main.cpp
#include <QDirIterator>
#include <QDebug>
#include <QFileInfoList>
#include <QCoreApplication>
#include <QFile>

int main(int argc, char** argv) {
  QFile file("éáőú");  // This file is visible in the list! But not any
other I create with touch for example.
  file.open(QFile::WriteOnly);
  file.close();

  QCoreApplication app(argc, argv);
  QDirIterator iterator(".", QDir::Files, QDirIterator::Subdirectories);
  while (iterator.hasNext())
    {
      qDebug() << "QDiriterator" << iterator.next();
    }

  qDebug() << QDir(".").entryList(QDir::System | QDir::AllEntries |
QDir::Hidden);

  qDebug() << QDir(".").entryInfoList(QDir::System | QDir::AllEntries |
QDir::Hidden);
}

Thanks in advance for any idea.

Robert
Olivier B.
2018-11-15 08:49:08 UTC
Permalink
What is the encoding of your source file?
QString constructors interprets char* as if they are UTF-8. If the
source file is encoded in your local encoding, the QString created for
QFile constructor will have a wrong unicode storage of your wanted
filename, then will try to convert what it thinks is UTF into your
local encoding to pass the filename to the system calls. Maybe the mac
explorer can work around this and adjust the displayed name, but the
Filesystem interface of Qt can't because that puts forbidden
characters in the real name?
Post by Manner Róbert
Post by Thiago Macieira
Post by Manner Róbert
int main() {
qDebug() << QDir(".").entryList();
QCoreApplication missing. Try again with it.
Tried, without success, still does not display these files. Even tried
with QDirIterator, that is also working the same (skipping these files).
With further checking I noticed that the files are only not displaying
if I do not create them with Qt. Eg I created with "touch filenamé". If
I create the same file with QFile, that seems to be found by these dir
lists. I know it sounds insane.
Post by Thiago Macieira
locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"
Post by Thiago Macieira
cat main.cpp
#include <QDirIterator>
#include <QDebug>
#include <QFileInfoList>
#include <QCoreApplication>
#include <QFile>
int main(int argc, char** argv) {
QFile file("éáőú"); // This file is visible in the list! But not any
other I create with touch for example.
file.open(QFile::WriteOnly);
file.close();
QCoreApplication app(argc, argv);
QDirIterator iterator(".", QDir::Files, QDirIterator::Subdirectories);
while (iterator.hasNext())
{
qDebug() << "QDiriterator" << iterator.next();
}
qDebug() << QDir(".").entryList(QDir::System | QDir::AllEntries |
QDir::Hidden);
qDebug() << QDir(".").entryInfoList(QDir::System | QDir::AllEntries |
QDir::Hidden);
}
Thanks in advance for any idea.
Robert
_______________________________________________
Interest mailing list
http://lists.qt-project.org/mailman/listinfo/interest
Manner Róbert
2018-11-15 13:54:22 UTC
Permalink
Hi,

((
file main.cpp
main.cpp: c program text, UTF-8 Unicode text
))

I think I have found the bug which causes this:
https://bugreports.qt.io/browse/QTBUG-70732

I am currently working on a workaround, I am now able to list the files
now using <dirent.h>:

        QStringList results;
        DIR *dir;
        struct dirent *ent;
        QByteArray bpath = path.toUtf8();
        if ((dir = opendir (bpath.constData())) != NULL) {
          /* print all the files and directories within directory */
          while ((ent = readdir (dir)) != NULL) {
            results << ent->d_name;
          }
          closedir (dir);
        } else {
          /* could not open directory */
          perror ("");
        }
        return results;

This gives back the files successfully, the downside is that they are in
UTF-8 normalization format D (UTF-8-mac) which represents accented
characters like O" format, so the original character (O) + the accent
("). Similarly as "ls -1" does.

And having a QString like that does not match "normal" (==
"normalization format C") UTF-8 strings. Eg.:

  QString ch1("\u00D6");   // this is normal representation of "Ö"
  QString ch2("O\u0308");  // this specifies O with an accent
  qDebug() << ch1 << ch2 << (ch1 == ch2 ? "matches!" : "does not match");

This outputs: "Ö" "Ö" "does not match" not only on mac, but even on linux.

So I also needed to convert "utf8 to utf8" :) for my complete
workaround. Unfortunately I did not find a way to do so with QTextCodec.
(Is there?) So I am trying now iconv... this seems to do the trick, but
quite ugly and inefficient.

QByteArray
utf8_unmac(const QByteArray &utf8_mac)
{
  iconv_t conv = iconv_open("UTF-8-mac", "UTF-8");
  if (conv == (iconv_t)-1)
    {
      RAISE(TestRunnerException(QString("Iconv open failed: ") +
sys_errlist[errno]));
    }

  char *inp = const_cast<char *>(utf8_mac.constData());  // iconv is
moving these thats why the duplication
  size_t inp_len = utf8_mac.size();

  size_t out_len = inp_len * 2;

  QByteArray utf8;
  utf8.resize(out_len);

  char *out = utf8.data();

  if (iconv(conv, &inp, &inp_len, &out, &out_len) == (size_t)-1)
    {
      RAISE(TestRunnerException(QString("Iconv convert failed: ") +
sys_errlist[errno]));
    }

  utf8.chop(out_len);
  iconv_close(conv);
  return utf8;
}

Hope it is useful for someone. To be honest, I always imagined that if
everyone would use UTF-8 life would be much better, now I'm unsure ;)

Br,

Robert
What is the encoding of your source file?
QString constructors interprets char* as if they are UTF-8. If the
source file is encoded in your local encoding, the QString created for
QFile constructor will have a wrong unicode storage of your wanted
filename, then will try to convert what it thinks is UTF into your
local encoding to pass the filename to the system calls. Maybe the mac
explorer can work around this and adjust the displayed name, but the
Filesystem interface of Qt can't because that puts forbidden
characters in the real name?
Post by Manner Róbert
Post by Thiago Macieira
Post by Manner Róbert
int main() {
qDebug() << QDir(".").entryList();
QCoreApplication missing. Try again with it.
Tried, without success, still does not display these files. Even tried
with QDirIterator, that is also working the same (skipping these files).
With further checking I noticed that the files are only not displaying
if I do not create them with Qt. Eg I created with "touch filenamé". If
I create the same file with QFile, that seems to be found by these dir
lists. I know it sounds insane.
Post by Thiago Macieira
locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"
Post by Thiago Macieira
cat main.cpp
#include <QDirIterator>
#include <QDebug>
#include <QFileInfoList>
#include <QCoreApplication>
#include <QFile>
int main(int argc, char** argv) {
QFile file("éáőú"); // This file is visible in the list! But not any
other I create with touch for example.
file.open(QFile::WriteOnly);
file.close();
QCoreApplication app(argc, argv);
QDirIterator iterator(".", QDir::Files, QDirIterator::Subdirectories);
while (iterator.hasNext())
{
qDebug() << "QDiriterator" << iterator.next();
}
qDebug() << QDir(".").entryList(QDir::System | QDir::AllEntries |
QDir::Hidden);
qDebug() << QDir(".").entryInfoList(QDir::System | QDir::AllEntries |
QDir::Hidden);
}
Thanks in advance for any idea.
Robert
_______________________________________________
Interest mailing list
http://lists.qt-project.org/mailman/listinfo/interest
Thiago Macieira
2018-11-15 16:07:51 UTC
Permalink
Post by Manner Róbert
Hope it is useful for someone. To be honest, I always imagined that if
everyone would use UTF-8 life would be much better, now I'm unsure
Blame Apple for deciding to use an uncommon normalisation back in the day for
HFS. The HFS stores filenames in NFD, even if you type NFC and that's what you
do in Terminal.app. But APFS doesn't convert.

Anyway, this is an open bug that needs a fix. I can't fix it because I don't
have access to any Mac with APFS (my 2011 Mac Mini is running HFS).
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
Henry Skoglund
2018-11-15 23:40:00 UTC
Permalink
Post by Thiago Macieira
Post by Manner Róbert
Hope it is useful for someone. To be honest, I always imagined that if
everyone would use UTF-8 life would be much better, now I'm unsure
Blame Apple for deciding to use an uncommon normalisation back in the day for
HFS. The HFS stores filenames in NFD, even if you type NFC and that's what you
do in Terminal.app. But APFS doesn't convert.
Anyway, this is an open bug that needs a fix. I can't fix it because I don't
have access to any Mac with APFS (my 2011 Mac Mini is running HFS).
Hi, couldn't resist looking into this bug (my 2012 MBP has 10.14 with APFS):

The problem I think is that the encodeName(decodeName(filename)) if
statement in qfilesystemiterator_unix.cpp is lossy; the incoming
direntries can be either encoded as NFC (e.g. files created in Terminal)
or NFD (files created via a Qt App or Finder), but the
Q_OS_DARWIN-flavored QFile::encodeName() function is hardwired to use
NormalizationForm_D only, that's why the NFC types of direntries get
tossed out :-(

So one solution could be to check for both NFC- and NFD-flavored
direntries that matches, say something like:

bool QFileSystemIterator::advance(QFileSystemEntry &fileEntry,
QFileSystemMetaData &metaData)
{
if (!dir)
return false;

for (;;) {
dirEntry = QT_READDIR(dir);

if (dirEntry) {
// process entries with correct UTF-8 names only
QString nfc = QFile::decodeName(dirEntry->d_name); // (any
NFDs are now converted to NFCs)

if ((nfc.normalized(QString::NormalizationForm_C).toUtf8()
== dirEntry->d_name) ||
(nfc.normalized(QString::NormalizationForm_D).toUtf8()
== dirEntry->d_name)) {
fileEntry = QFileSystemEntry(nativePath +
QByteArray(dirEntry->d_name), QFileSystemEntry::FromNativePath());
metaData.fillFromDirEnt(*dirEntry);
return true;
}
} else {
break;
}
}

lastError = errno;
return false;
}

Note: this is written without any testing or compiling, just an idea!
Rgrds Henry

Loading...