JFFS2 mount time

Discussion:

JFFS2 mount time

Ferenc Havasi

2004-10-20 14:26:27 UTC

Dear All,

Here is the latest version of our mount time improvement.

Using of it:
- apply this patch on the latest version of MTD
- compile sumtool (make command in mtd/util)
- make your JFFS2 image as before (or you can use already created images
as well)
- run sumtool to insert summary information, for example:
./sumtool -i original.jffs2 -o new.jffs2 -e128KiB
- recompile your kernel with "JFFS2 inode summary support"

Jarkko made a measurement on a real NAND device: his JFFS2 image was
120819928 (115M), after running sumtool the new image was 123338752 (117M).

Using the original mount time was 55 sec, with the new image it is only
8.5 sec.

It works very similar as our previous improvement: stores special
information at the end of the erase blocks, and at mount time if there
is this kind of information the scaning of the erase block is unneccessary.

New things compared to our previous improvement:
- it was fully rewritten
- we separated the user space tool from mkfs. (sumtool)
- sumtool now not only inserts the summary information but also make
some node-reordering. There will be two kind of erase blocks: in the
"first type" there will be only jffs2_raw_inodes, and all other node
(jffs2_raw_dirent) will be stored in the "second type". It generates
summary at the end of all "fist type" eraseblock. (the "second type"
will be scanned as before, because all information is needed in
jffs_raw_dirent at mount time)

Ceratinly all of these things are optional (as you can see above you
have to select it from kernel config). The JFFS2 image produced by
sumtool is also usable with previous kernel because the summary node is
JFFS2_FEATURE_RWCOMPAT_DELETE.

I think it can be usefull not only for us. David, may I commit it to the
CVS?

Regards,
Ferenc

Artem B. Bityuckiy

2004-10-21 06:29:11 UTC

Permalink

Hello Ferenc,

As I understand, you only prepare JFFS2 image with summaries. This is
great until we do not change anything. For read-only file-systems this
is OK.

But what if files/direntries are changed/deleted ? Do you write summary
information dynamically? How are you going to place nodes/direntries to
different blocks dynamically?

Post by Ferenc Havasi
Dear All,
Here is the latest version of our mount time improvement.
- apply this patch on the latest version of MTD
- compile sumtool (make command in mtd/util)
- make your JFFS2 image as before (or you can use already created images
as well)
./sumtool -i original.jffs2 -o new.jffs2 -e128KiB
- recompile your kernel with "JFFS2 inode summary support"
Jarkko made a measurement on a real NAND device: his JFFS2 image was
120819928 (115M), after running sumtool the new image was 123338752 (117M).
Using the original mount time was 55 sec, with the new image it is only
8.5 sec.
It works very similar as our previous improvement: stores special
information at the end of the erase blocks, and at mount time if there
is this kind of information the scaning of the erase block is unneccessary.
- it was fully rewritten
- we separated the user space tool from mkfs. (sumtool)
- sumtool now not only inserts the summary information but also make
some node-reordering. There will be two kind of erase blocks: in the
"first type" there will be only jffs2_raw_inodes, and all other node
(jffs2_raw_dirent) will be stored in the "second type". It generates
summary at the end of all "fist type" eraseblock. (the "second type"
will be scanned as before, because all information is needed in
jffs_raw_dirent at mount time)
Ceratinly all of these things are optional (as you can see above you
have to select it from kernel config). The JFFS2 image produced by
sumtool is also usable with previous kernel because the summary node is
JFFS2_FEATURE_RWCOMPAT_DELETE.
I think it can be usefull not only for us. David, may I commit it to the
CVS?
Regards,
Ferenc

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

Ferenc Havasi

2004-10-21 06:54:15 UTC

Permalink

Hi Artem,

Post by Artem B. Bityuckiy
As I understand, you only prepare JFFS2 image with summaries. This is
great until we do not change anything. For read-only file-systems this
is OK.
But what if files/direntries are changed/deleted ? Do you write summary
information dynamically? How are you going to place nodes/direntries to
different blocks dynamically?

You are right, there is a small change which is really important (and
will be ready very soon) to extend jffs2_mark_node_obsolete() to mark
not only the node but also its entry in the summary.

Any other improvement can be done later, because after it the filesystem
will be always coherent, because we write summary only at the of the
erasy blocks, when it is fully "finished" - so if there is a summary
somewhere we will not need to extend it, only to mark the obscolated nodes.

We also plan in the near future to implement the ability of generating
summary dinamically when the filesystem finishes an erase block - which
keep this "fast mount time" permament.

Bye,
Ferenc

Artem B. Bityuckiy

2004-10-21 07:16:01 UTC

Permalink

Post by Ferenc Havasi
You are right, there is a small change which is really important (and
will be ready very soon) to extend jffs2_mark_node_obsolete() to mark
not only the node but also its entry in the summary.

Unfortunately, you can not mark entries as obsoleted in your summary
node in case of NAND.

If you write your summary only for *full* blocks, you will not need to
mark entries obsoleted, even if you have NOR flash (but you can on NOR).
The partially filled blocks must not have the summary node (you can
introduce special marker and write it to OOB of the last page of
NAND/last word of sector on NOR which tells if there is the summary node
present).

So, fully filled block will have summary and will be scanned very
quickly, partially filled ones will have no summary and will be fully
scanned, free blocks will have cleanmarkers and will not be scanned,
other blocks will be either erased or considered free.

Post by Ferenc Havasi
Any other improvement can be done later, because after it the filesystem
will be always coherent, because we write summary only at the of the
erasy blocks, when it is fully "finished" - so if there is a summary
somewhere we will not need to extend it, only to mark the obscolated nodes.

Yes, nice, but why do you need to mark obsoleted nodes in summary ???
When you insert node to the fragtree or dirents to the list, JFFS2 code
will detect obsoleted nodes automatically, no need to mark them physically.

Post by Ferenc Havasi
We also plan in the near future to implement the ability of generating
summary dinamically when the filesystem finishes an erase block - which
keep this "fast mount time" permament.

This would be perfect.

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

Ferenc Havasi

2004-10-21 19:50:11 UTC

Permalink

Hi Artem,

Post by Artem B. Bityuckiy
Unfortunately, you can not mark entries as obsoleted in your summary
node in case of NAND.
If you write your summary only for *full* blocks, you will not need to
mark entries obsoleted, even if you have NOR flash (but you can on NOR).
The partially filled blocks must not have the summary node (you can
introduce special marker and write it to OOB of the last page of
NAND/last word of sector on NOR which tells if there is the summary node
present).

Really, you are right.

So we only have to solve this problem on NOR. I think the easiest
solution is to set jffs2_can_mark_obsolete() to false if the summary
support is enabled.

Bye,
Ferenc

Ferenc Havasi

2004-10-21 07:39:00 UTC

Permalink

Hi Artem,

1. How large are your summary nodes (in average) for blocks full of
dirents/nodes ?

It heavily depends on
- the size of the earase block
- the sizes of the nodes
It is 4 words for every jffs2_raw_inode. Dirents are stored separatedly
without summary.

2. Why do not you use compression for them?

To make boot time as fast as possible :) But not a bad idea. If someone
needs it we can make a new option.

3. Why did you introduce new tool instead of just adding new options to
the mkfs.jffs2 ?

I think it is "nicer", cleaner design, and uing this separation the
reordering of the nodes is much more easier.

Bye,
Ferenc

David Woodhouse

2004-10-21 13:24:20 UTC

Permalink

Post by Ferenc Havasi
Dear All,
Here is the latest version of our mount time improvement.

It's looking good, but the kernel really needs to be able to write these
summaries for _itself_ in order to give a real improvement over the long
term. If the file system has to be read-only we might as well be using
cramfs, and if the summary becomes obsolete over time we might as well
not bother in a lot of cases.

--
dwmw2

Ferenc Havasi

2004-10-21 20:05:23 UTC

Permalink

Post by David Woodhouse
It's looking good, but the kernel really needs to be able to write these
summaries for _itself_ in order to give a real improvement over the long
term. If the file system has to be read-only we might as well be using
cramfs, and if the summary becomes obsolete over time we might as well
not bother in a lot of cases.

Our plan for it:

We would like to store some additional information in jeb struct:
- a type information, where there this type can be INODE_ONLY and
ANYTHING_OTHER. This information is easy to detect during mount time.
- a predicted summary size (calculated dinamically). It will be used to
decide when to generate the summary. Ceratinly only for INODE_ONLY
erase blocks.

If I am right every node allocation is done by jffs2_reserve_space(). We
would like to modify it, and introduce a new interface for it called
jffs2_reserve_space_for_inode() function. Every inode storing function
(there is no too much I think) should call
jffs2_reserve_space_for_inode() with some extra information (inode
number...).

jffs2_reserve_space() should use only ANYTHING_OTHER eraseblocks, as
jffs2_reserve_space_for_inode() use only INODE_ONLY ones. If there is no
free space in them it should use the usual technique to find a clean
eraseblock and start to store the new node in it.

The generating of summary is also the task of
jffs2_reserve_space_for_inode(), if the new inode (+summary) is not fit
in the erase block, it will generates summary.

What do you think?

Regards,
Ferenc

Artem Bityuckiy

2004-10-22 12:44:13 UTC

Permalink

Hello Ferenc,

At first, please, let me describe your design shortly to be sure I
understand it and we both thinking the same way.

Essentially, your design is based on the fact that you do not want to
refer directory entries in the summary nodes. Motivation that you will
keep almost the copy of direntries in the summary, thus:
1. duplicating too many information.
2. you suppose there will not be the mount speed acceleration.

So, for this purpose you are going to distribute the inode nodes and
other (including direntry nodes) by different blocks. Those blocks, who
contain only the inode nodes, will have summaries, other blocks - will not.

I think this is not the best solution. Why? In general, because I do not
like the following:
A. Your idea to distribute inode nodes and other nodes between different
blocks.
B. Your assumption that the directory information in summaries will not
affect the mount time.

The following are reasons concerning the item A.

1. Your change will affect JFFS2 very heavily. You will introduce
restriction into JFFS2. Another improvements may not work with such
restriction. Now all the blocks are equivalent. But you want to
distinguish between two kins of blocks. Don't you think it is too
complicated decision?

2. Think about the wear-leveling. In JFFS it was ideal. In JFFS2 it is
good, but not so ideal. I average, the inode nodes are changed more
often (just think about FIFOs, we told about them in this list
recently). So, you will need to Garbage Collect the NODE_ONLY blocks
more often. So, I afraid the wear-leveling will suffer from your
improvement.

3. Imagine the file system with *lots* of very small files. I this case,
the direntries portion on the media will be large enough. And the
mount time of such file system will not be improved very well.

4. It seems for me you will need to increase the number of blocks which
are reserved for the garbage collection (double ?). This is also minor
drawback.

The following are reasons concerning the item B.

I believe that if we have directory references in summaries, this will
increase the mount speed.

1. At first, we will store fewer data! We don't need to keep the common
headers, CRCs and mctimes.
2. At the second, we may compress summary (direntries aren't compressed)!
3. And the third, on NAND there is difference between reading lots of
different pages or few pages.

I propose the another design.

1. Keep direntry references in summaries too and hence, do not
distinguish between blocks with inode nodes and direntries.
2. Compress summaries.

So, you will avoid a lot of problems related to teaching the GC
distinguish between different blocks. This will be more natural. I
believe, summaries must refer *any* node in block. This is more simple
and clean design.

Why you do not like this?

I see only one potential problem: direntries may have long names (up to
255 symbols). this may lead to large summaries.

But in this case we may do:
1. Improve the JFFS2 itself. Keep, say, only 20, characters in the
full_dirent structure. Most of direntries will fit. For other, we will
just read the flash.
2. We may not touch JFFS2, and keep only 20 characters in summaries. For
other direntries, we may read them from flash (keeping theirs flash
offsets instead of names).

Comments?

--
Best regards, Artem B. Bityuckiy
Oktet Labs (St. Petersburg), Software Engineer.
+78124286709 (office) +79112449030 (mobile)
E-mail: ***@oktetlabs.ru, web: http://www.oktetlabs.ru

Ferenc Havasi

2004-10-25 09:36:29 UTC

Permalink

Hi Artem,

Post by Artem Bityuckiy
So, for this purpose you are going to distribute the inode nodes and
other (including direntry nodes) by different blocks. Those blocks, who
contain only the inode nodes, will have summaries, other blocks - will not.

Yes, I think there are three kinds of nodes:
- type A contains relevant amount of data which is not needed at mount
time (jffs2_raw_inode)
- type B is (almost) fully needed at mount time (jffs2_raw_dirent)
- type C is any other (unkown, developements in the future...)

To achieve as much mount time speed up as possible I think we should
distinguish them.

Using summary the really relevant speed up will be only at node type
A. We can also generate summary for type B, but that (as you wrote)
relevant ratio of the information will be duplicated.

So we whould like to intorduce two kinds of erase blocks:
- erase blocks with summary: it will store (now only) type A nodes,
maybe later some of type B
- erase block without summary: it will store all of type C and B nodes
which is not stored before

Post by Artem Bityuckiy
1. Your change will affect JFFS2 very heavily. You will introduce
restriction into JFFS2. Another improvements may not work with such
restriction. Now all the blocks are equivalent. But you want to
distinguish between two kins of blocks. Don't you think it is too
complicated decision?

What kind of restriction do you mean? We don't introduce any
restrictions. The "type C" kind of nodes are processed as before, using
the usual scanning method. If you what to force for every node to make
their represenation in the summary, that whould be a restriction.

I think for some kinds of node summary is meaningful, and for some kinds
not.

If we mix them that can be a very big slow down, if you what to process
them only with making a reference in the summary to its offset, because
if you (for example) what to read only 50 bytes (size of the node) you
will have to read 512/2048 bytes depening on the flash. (where mostly
there will be inode nodes which is not neccesery to read because that is
int he summary)

But if all of this "not summarized, small" nodes are stored in a
"seperated" erase block than the this 512/2048 byte reading will not be
unnecessary (because on the remaining 462-1998 bytes will store also
relevant information, which is not in the summary).

Post by Artem Bityuckiy
2. Think about the wear-leveling. In JFFS it was ideal. In JFFS2 it is
good, but not so ideal. I average, the inode nodes are changed more
often (just think about FIFOs, we told about them in this list
recently). So, you will need to Garbage Collect the NODE_ONLY blocks
more often. So, I afraid the wear-leveling will suffer from your
improvement.

I think the GC solves it "automaticly". This mark
(SUMMARIZED/NOT_SUMMARIZED) is not a premament thing, it is done "pseudo
randomly".

I aggree that it cause some different behavior in wear-leveling but I
don't think it makes it relevantly worse.

Post by Artem Bityuckiy
4. It seems for me you will need to increase the number of blocks which
are reserved for the garbage collection (double ?). This is also minor
drawback.

I don't understand what do you mean here.

Post by Artem Bityuckiy
I believe that if we have directory references in summaries, this will
increase the mount speed.
1. At first, we will store fewer data! We don't need to keep the common
headers, CRCs and mctimes.
2. At the second, we may compress summary (direntries aren't compressed)!
3. And the third, on NAND there is difference between reading lots of
different pages or few pages.

Yes, we should try it - to store dirents in SUMMARIZED erase blocks. But
it can be a improvement later, for first we need a well working stable
system - and this is urgent for us now.

Post by Artem Bityuckiy
2. Compress summaries.

It makes harder to determine the optimal time of summary generation (it
is easy to see the summary size, but here the compressed size of it the
relevant). It can cause smaller image but may cause some slow down, too.
We may introduce it later as an option.

So now we have two open discussion:
- is the SUMMARIZED / NOT_SUMMARIZED distiguishment good or not
- in the first version do we need dirents in the summary or not

Fortunatelly the effects (and side effects) of this improvements will be
active only if the new kernel option is enabled, and don't kill any
other future improvements.

I curious about (at least) David's optinion about these topics.

Bye,
Ferenc

Artem Bityuckiy

2004-10-25 10:56:29 UTC

Permalink

Hello Ferenc,

Post by Ferenc Havasi
- type A contains relevant amount of data which is not needed at mount
time (jffs2_raw_inode)
- type B is (almost) fully needed at mount time (jffs2_raw_dirent)
- type C is any other (unkown, developements in the future...)
To achieve as much mount time speed up as possible I think we should
distinguish them.

This is what I really do not like.

Ok, let us discuss now only this topic. Lt I explain why I believe it is
vad and very *unnatural* to introduce two or more kinds of blocks.

The example of JFFS2 change that I consider natural is the introduction
of new node type. It is natural, because of when JFFS2 was designed,
this possibility was foreseen and taken into account. It is relatively
easy to do this. It is possible to do this and do not affect other
things in the JFFS2.

Conversely, the introducing several block types was not foreseen in the
JFFS2 design. And all things in the JFFS2 are coded with the assumption
all the blocks are equivalent.

This is my point view on the issue in general.

Now I will try to illustrate why I think so.

1. In JFFS2 there are several lists of blocks - clean_list, dirty_list,
very_dirty_list?. Are you going to introduce clean_list_typeA,
dirty_list_typeA, very_dirty_list_typeA, clean_list_typeB,
dirty_list_typeB, very_dirty_list_typeB ?

2. Just do 'grep "_list" * | grep -e "$dirty$\|$very$"' and see how
many places in JFFS2 where these lists are changed. Do you think it is
natural to introduce 3 more lists? I believe not. What if somebody else
will introduce one more type of block?

3. There is write buffer in the JFFS2 which is used in case of NAND. Are
you going to have two wbufs? This is also significant change.

4. Now the GC just gives one block, and moves all the valid nodes to
another one. In your case (if you have the JFFS2 image which was created
by older code, without your patch, where all node types are mixed),
you will need to move one type of nodes to one block, another to the
another block.

So, I think you will be needed to change many things in JFFS2. You have
a risk to hit on a can of worms.

So, do you agree that this change is *unnatural* ?

===================================================================

Post by Ferenc Havasi

Post by Artem Bityuckiy
4. It seems for me you will need to increase the number of blocks
which are reserved for the garbage collection (double ?). This is also
minor drawback.

I don't understand what do you mean here

I mean the sb->resv_blocks_gcmerge and related. You will need to
increase it, which is not very good.

--
Best regards, Artem B. Bityuckiy
Oktet Labs (St. Petersburg), Software Engineer.
+78124286709 (office) +79112449030 (mobile)
E-mail: ***@oktetlabs.ru, web: http://www.oktetlabs.ru

Ferenc Havasi

2004-10-25 15:30:20 UTC

Permalink

Hi Artem,

Post by Artem Bityuckiy

Post by Ferenc Havasi
To achieve as much mount time speed up as possible I think we should
distinguish them.

This is what I really do not like.
Ok, let us discuss now only this topic. Lt I explain why I believe it is
vad and very *unnatural* to introduce two or more kinds of blocks.

You are right, it can be unnatural in point of the original design of
the JFFS2. But I think in point of the connection of this optimization
and JFFS2 it is more natural than simple store offsets in the summary,
or copy all the information into it.

Our plan was modify wbuf (make a second one) and modify
jffs2_reserve_space to select the right wbuf and generate summary. Never
planded to introduce new clean_*, dirty_*, ... lists, thats really too
difficult.

Post by Artem Bityuckiy
3. There is write buffer in the JFFS2 which is used in case of NAND. Are
you going to have two wbufs? This is also significant change.

Yes, we started to implement it yesterday and now agree. It is really
not easy, and we don't write to rewrite the NAND handling part of JFFS2
whithout a real NAND device. Maybe at the design of JFFS3 :)

So you convinced me. We will change the design of summary. The inodes
and dirents will be also in the summary. All other nodes will be copied
as itself into the summary and cause a warning. The summary support will
be a required thing for new node types, too.

In the kernel we will have to modify
1. jffs2_scan_eraseblock(), as it is already in our patch
2. jeb struct to store generated the summary dinamically (one plus field)
3. jffs2_reserve_space(), which will have a new parameter (summary
size), which can be JFFS2_SUMMARY_INODE_SIZE or
JFFS2_SUMMARY_DIRENT_SIZE(namelen). It can decide when to generate
summary and it can do this generation.
4. jffs2_flash_writev(), which is used to write info to flash. It can
parse the node (similar to sumtool) and store the summary of it in its jeb.

If it works we'll check the effect of compressing the summary. (size and
speed)

Comments?

Bye,
Ferenc

P.s.: Thanks for this good conversation.

Artem Bityuckiy

2004-10-26 09:59:27 UTC

Permalink

Hello Ferenc,

Post by Ferenc Havasi
In the kernel we will have to modify
1. jffs2_scan_eraseblock(), as it is already in our patch
2. jeb struct to store generated the summary dinamically (one plus field)

IMHO, since the summary relates only to one block, the current block, it
is logical to refer the summary from the jffs2_sb_info, not from
jffs2_erase_blocks. It is also not very nice to store it in the
jffs2_erase_blocks since it will increase the size of array of JFFS2
blocks (c->blocks[]).

Post by Ferenc Havasi
3. jffs2_reserve_space(), which will have a new parameter (summary
size), which can be JFFS2_SUMMARY_INODE_SIZE or
JFFS2_SUMMARY_DIRENT_SIZE(namelen). It can decide when to generate
summary and it can do this generation.

Yes, I also think so.

Currently the jffs2_do_reserve_space() do (as I understand):
1. If the current block (c->nextblock) have space and it is sufficient
for request, it reserves it.
2. If the c->nextblock has fewer size, than requested, the c->nextblock
is wasted, put to the correspondent list (dirty_list, etc), free block
is taken and reserved.

Thus, the jffs2_do_reserve_space() should be improved to be able to save
some space for summary. And, some function like jffs2_write_summary()
which will be called before jffs2_do_reserve_space() takes new block
from the free_list.

Post by Ferenc Havasi
4. jffs2_flash_writev(), which is used to write info to flash. It can
parse the node (similar to sumtool) and store the summary of it in its jeb.

May be write here... Didn't think a lot... May be as I wrote, in
jffs2_do_reserve_space()...

I also offer you to include direntries in summaries and compress them. See:

sizeof(struct jffs2_raw_dirent) = 40 (without name)
you will need to store in your summary only:

totlen
pino
version
ino
nsize
type
name

which is 24 bytes. You don't store all data! Of course, in case of long
names things are not so good...

If you also compress them, they will be smaller (minus 50-70%)!

So, if there are few direntries in block, why not to store them in summary?

Did you measured the time of summary uncompress on your system? I can't
know for sure, but I suspect that if you have, say, 200MHz system, the
time of uncompression = o(time of block read)!

There is one more issue: if there are too many direntries in block,
summary may become too large (the compression helps here). In this case
you may not write summary or don't mention direntries in summary.

--
Best regards, Artem B. Bityuckiy
Oktet Labs (St. Petersburg), Software Engineer.
+78124286709 (office) +79112449030 (mobile)
E-mail: ***@oktetlabs.ru, web: http://www.oktetlabs.ru

Ferenc Havasi

2004-10-26 10:21:12 UTC

Permalink

Hi Artem,

Post by Artem Bityuckiy
IMHO, since the summary relates only to one block, the current block, it
is logical to refer the summary from the jffs2_sb_info, not from
jffs2_erase_blocks. It is also not very nice to store it in the
jffs2_erase_blocks since it will increase the size of array of JFFS2
blocks (c->blocks[]).

Is it sure than only one non-full erase block is in the filesystem?
Non-full means here that there is some nodes already in that, but also
there is some free space at the end of it.

Post by Artem Bityuckiy

Post by Ferenc Havasi
4. jffs2_flash_writev(), which is used to write info to flash. It can
parse the node (similar to sumtool) and store the summary of it in its jeb.

May be write here... Didn't think a lot... May be as I wrote, in
jffs2_do_reserve_space()...

As I see jffs2_do_reserve space is called before inode/... allocation in
most cases. So at that time the summary information is not know - but at
writing it have to be known certainly.

Post by Artem Bityuckiy
So, if there are few direntries in block, why not to store them in summary?

You may misunderstood me. In the previous letter I wrote: "So you
convinced me. We will change the design of summary. The inodes and
dirents will be also in the summary."

So now we do plan to store dirents in the summary. :)

Post by Artem Bityuckiy
Did you measured the time of summary uncompress on your system? I can't
know for sure, but I suspect that if you have, say, 200MHz system, the
time of uncompression = o(time of block read)!

It depends on the compressor.

We will test it with zlib/rtime. I whould like to implement as an
optional feature.

Post by Artem Bityuckiy
There is one more issue: if there are too many direntries in block,
summary may become too large (the compression helps here). In this case
you may not write summary or don't mention direntries in summary.

Let see how it work, and after we can make it more optimal :)

Bye,
Ferenc

Artem Bityuckiy

2004-10-26 11:05:50 UTC

Permalink

Ferenc,

Post by Ferenc Havasi
Is it sure than only one non-full erase block is in the filesystem?
Non-full means here that there is some nodes already in that, but also
there is some free space at the end of it.

I didn't analyse this accurately, but my vision is that there is one
current block (c->nextblock). Even GC moves nodes to it. This is because
the jffs2_do_reserve_space() is always used (even by GC), and the
jffs2_do_reserve_space() always uses c->nextblock.

Post by Ferenc Havasi
As I see jffs2_do_reserve space is called before inode/... allocation in
most cases. So at that time the summary information is not know - but at
writing it have to be known certainly.

May be... From another hand you may write summary every time the
jffs2_reserve_space() fetches new block from the free_list...
Anyway, this is not fundamental...

Post by Ferenc Havasi
You may misunderstood me. In the previous letter I wrote: "So you
convinced me. We will change the design of summary. The inodes and
dirents will be also in the summary."
So now we do plan to store dirents in the summary. :)

OK, sorry. :-)

Post by Ferenc Havasi
Let see how it work, and after we can make it more optimal :)

Agree :-)

Also, please, take into account that there may be checkpoint nodes (I'm
implementing this). So, I think you need to have a generic mechanism to
add new node types to your summary.

Also, I think it is good to have a generic mechanism to just refer some
nodes from summaries (for example, direntries with long names or
something else).

Thank you for conversation too.
:-)

--
Best regards, Artem B. Bityuckiy
Oktet Labs (St. Petersburg), Software Engineer.
+78124286709 (office) +79112449030 (mobile)
E-mail: ***@oktetlabs.ru, web: http://www.oktetlabs.ru

Ferenc Havasi

2004-10-26 13:52:20 UTC

Permalink

Hi Artem,

Post by Artem Bityuckiy
Also, please, take into account that there may be checkpoint nodes (I'm
implementing this). So, I think you need to have a generic mechanism to
add new node types to your summary.
Also, I think it is good to have a generic mechanism to just refer some
nodes from summaries (for example, direntries with long names or
something else).

Yes, it will be easy to extend.

We also need a this general support - because we will introduce a new
node type, too, becauseof the model file support, which will start to
commit when David finishes his patch for Linus.

Bye,
Ferenc

Artem Bityuckiy

2004-10-25 11:21:28 UTC

Permalink

Post by Ferenc Havasi
I curious about (at least) David's optinion about these topics.

I also wonder why people are not very active :-)

--
Best regards, Artem B. Bityuckiy
Oktet Labs (St. Petersburg), Software Engineer.
+78124286709 (office) +79112449030 (mobile)
E-mail: ***@oktetlabs.ru, web: http://www.oktetlabs.ru

Ferenc Havasi

2004-10-26 10:24:46 UTC

Permalink

Hi Jarkko,

If dentries were stored just as they are (unstripped and uncompressed)
in the summary, the summary size would grow by 50% to about 3% of the
whole image size.

Thanks, good to know it.

Did you got ECC/CRC errors? The most interest test for me whould be to
test the new (sumtool) image with the original kernel (because the
summary nodes are compatibles it should work), and see if there is
ECC/CRC errors or not.

Bye,
Ferenc

Artem Bityuckiy

2004-10-26 10:34:09 UTC

Permalink

Hello Jarkko,

This is very good that direntries are distributed more or less uniformly
in average.

Ferenc's latest patch put dentries on their own erase block in
consecutive order. Considering only the read efficiency from the
media, reading consecutive, uncompressed, and unstripped dentries from
a summary should cost no more than reading them from dedicated erase
block.

Definitely true - the second patch must be better than the first one. But
unfortunately, it hard to do this dinamically :-( Ferenc tried...

But in my proposition, we will also refer direntries in the summary -
this is not the same as to read direntries from where they are placed,
this is another thing, especially in case of NAND! There is difference
(if we have NAND) - whether to read one 512 NAND page containing
compressed information about 20-25 direntries or to read 20-25
*different* NAND pages.

So, I think, new design will also better than the early Ferenc's patch :-)

--
Best regards, Artem B. Bityuckiy
Oktet Labs (St. Petersburg), Software Engineer.
+78124286709 (office) +79112449030 (mobile)
E-mail: ***@oktetlabs.ru, web: http://www.oktetlabs.ru

Jarkko Lavinen

2004-10-26 09:29:28 UTC

Permalink

I tried to see with jffs2dump how much Inodes and Dirents I have on
root filesystem on Arm testbed. Quick and dirty Perl script attached.
This isn't accurate as the calculated total image size misses at least
the final padding on the last erase block.

The size of the plain JFFS2 image is 31.1 MiB. The root fs consists of
all applications and libraries and no user data.

$ jffs2dump -c rootfs.jffs2 | perl jffs2stats.pl
Number of dirents: 6144.
Total dirent node space: 304911 (0.9%)
Average dirent len: 49.6
Total dirent name space: 76671
Average name len: 12.5

Number of Inodes: 21197
Total Inode space: 32254866 (99.1%)
Average Inode size: 1521.7

Padding: 37326 0.1%
Total image size: 32559777
$ ls -l rootfs.jffs2
-rw-r--r-- 1 root root 32597104 Oct 20 15:11 rootfs.jffs2

With sumtool the image size grows to 31.8 MiB

$ jffs2dump -c rootfs-sum.jffs2 | perl jffs2stats.pl
Number of dirents: 6144.
Total dirent node space: 304911 (0.9%)
Average dirent len: 49.6
Total dirent name space: 76671
Average name len: 12.5

Number of Inodes: 21197
Total Inode space: 32254866 (97.2%)
Average Inode size: 1521.7

Number of Inode Summary nodes: 251
Total Inode Sum space: 631524, (1.9%)
Average Sum node size: 2516.0

Padding: 153063 0.5%
Total image size: 33191301
$ ls -l rootfs-sum.jffs2
-rw-r--r-- 1 root root 33423360 Oct 20 15:23 rootfs-sum.jffs2

If dentries were stored just as they are (unstripped and uncompressed)
in the summary, the summary size would grow by 50% to about 3% of the
whole image size.

I tried Ferenc's earlier mount time patch in August and the 52s mount
time dropped then to 14s. If I understand right, inodes and dentries
were then mixed in the erase block and the summary was for inodes
only. This shows reading dentries from semirandom places is
expensive.

Ferenc's latest patch put dentries on their own erase block in
consecutive order. Considering only the read efficiency from the
media, reading consecutive, uncompressed, and unstripped dentries from
a summary should cost no more than reading them from dedicated erase
block.

Jarkko Lavinen