Changed since Thursday, June 10, 2004

Copyright (c) 2004 Iomega Corp.

Warranty Disclaimed

Linux fs/ udf/ Walk Thru

Welcome, bienvenido, bienvenue, benvenuto.

Here you can see we have quoted a walk thru of the Linux fs/ udf/ code, that ran after the vfs walk thru.  We hope the community will grow this work into commentary for the fs/ udf/ source code.

These cryptic notes aren't yet really English.  In blog, you can see we're looking to discover a translation to English.

Alternate Views

HTML

compressed plain text

in development

blog

Links

vfs walk thru

iomrrdtools


notes:

udf_process_sequence never fails
check read or write rev should decide EFE UDF_FLAG_USE
check 16 bit chars (why does memcmp work)
check udf_tread in udf_update_inode

questions:

trace i/o

big

fsck
print metadata
mkudffs designed to read back in for fsck
test integrity
recovery

mount label hooks not yet maybe
journalling
alignment for CD-RW

overview

super.c
see dmesg
ls -lai
always case sensitive
truncating a file never puts it back into the inode

preallocate 8 when write over the end
e.g. always when appending

place new inode
at first free int of the bitmap after parent
something like
if exists

worm not complete

terminology
upper bits of AD length
R A = data
!R !A = hole
!R A = preallocation
NE

dvd spec - udf 1.02 dvd interoperability
in effect max 1 GiB - 2 KiB per file
from 1 GiB - 1 byte limit

mount -o options helps us focus on different test cases

quota keeps i_blocks summed, maybe works

include/linux/

udf_fs.h // UDFFS_VERSION, udf_debug, etc.
udf_fs_i.h // was included in linux/fs.h struct inode
udf_fs_sb.h // was included in linux/fs.h struct super_block

fs/udf/

udfdecl.h // prototypes, etc.
udf_i.h // access of inode private data
udf_sb.h // access of super_block private data

ecma_167.h // ecma_167
osta_udf.h // osta_udf
udfend.h // big vs. lil

super.c // udf_fill_super
inode.c // udf_iget
dir.c // udf_readdir
file.c // udf_file_write _adinicb udf_ioctl
namei.c // lookup create mknod rmdir ...
symlink.c // udf_symlink_filler
fsync.c // sync
truncate.c //

ialloc.c // create/destroy inode with UDF_I_ and call balloc.c
balloc.c // alloc/free of blocks - bitmaps, tables
directory.c // read/ write FID, AD even when split, verification
lowlevel.c // find last session and last written block
misc.c // extended attributes, tag etc. verification
partition.c // virtual partitions, sparing, block relocate, etc.

crc.c // UDF CRC = ITU-T V.41
udftime.c // Linux time vs. UDF time
unicode.c // OSTA Compressed Unicode etc.

$


--- 11:00am

super.c udf_fill_super

cdr-r last block
udf_get_last_block
ioctl CDROM_LAST_WRITTEN

udf_check_valid
can be ignored mount -o novrs
udf_vrs
BEA NEA

udf_find_anchor - does not return pass/fail
- 0, - 2, - 150, - 152, ...
fixed packet size
32 blocks + 3 lead in + 3 lead out
lead blocks sometimes counted as lba's
varlastblock
converts between with and without
counting lead
TAG_IDENT_AVDP
multisession support
skip early stuff
try from the end in
dunno why
if (lastblock)
conversion of 256 is 312
$ dc -e '32 7 + 256 * 32 / p'
312
$
cf. mmc "fixed packet writing" cd-rw
vs. variable cd-r
unaware of cd-rw
reads fixed packet
as if variable
UDF_SB_ANCHOR is anchors found
indicates none found somehow
maybe array of four
N N-256 256 N-512
N-512 in non-closed disc
for reserved while open
cleared if not found
success is one or more not clear
three normal

s_magic written but not read

udf_load_partition - can fail

iterating over UDF_SB_ANCHOR
main & reserved descriptor sequence
happy with first we like

udf_process_sequence
maybe never fails, so maybe reserved never read

success breaks look
first not-corrupt anchor is happy
find the VDS blocks we care about
last one wins e.g. for write-once disc
VDS_POS_...
maybe Windows emphasises volume label
UDF_SB_RECORDTIME timestamp

udf_load_logicalvol
finds the fileset maybe
UDF_SB_PART... per LVD
type 1
normal
type 2 normal in cd-r cd-rw
vat and sparable
parts can be out of sequence
PARTFUNC esp. for vat
physical from logical lba
if sparred
look in the vat
logicalVolContentsUse
udf_load_logicalvolint
LVID
int = integrity
recursively find the last one
after loading the fileset

udf_load_partdesc
fetch UDF_SB_PARTFLAGS for future use maybe
space tables
unallocated table
...
free table
free bitmap
the table is an inode
update needed when writing
even fetch not needed to read
redundant ways of being free
where part starts
take the last and be happy maybe
offsets all lba's hereafter

more try to discover last block stuff
if VAT (= virtual allocation table)
successive versions written
VAT may or may not look like an inode
in successive revisions of udf specs

fail if no anchors, if no partitions
should catch any actual VRS failures

find stuff thru the LVID
the revision stuff
our support vs. disc
we never raise the minimum UDF_SB_UDFREV
per spec don't autoraise
decides EFE vs. FE etc.

next way of failing is no partitions

udf_find_fileset
finds the root inode
need udf, ecma not sufficient
file fileset, find space bitmap
at offset within partition
loop thru to find the latest
thru redirects if any
til terminator
written explicitly we hope,
else an unwritten block
such as zero
maybe error too
block layer objects

if mount without -ro
udf_open_lvid
say the disc is unclean

inode = udf_iget(sb, rootdir)
this is the root inode
d_alloc_root

s_maxbytes = MAX_LFS_FILESIZE
maybe 32 bits of blocks
maybe 64 bits

error_out: udf_close_lvid

inode.c udf_iget

how many parts
vat has physical and virtual parts
virtual part only contains pointers to inodes
inodes reference physical always
parts can overlap
so lba doesn't map back

lb_addr is 48 bits = 16 bits part + 32 bits offset

udf_get_lb_pblock returns device lba
sb has translation for the one partition we have
ino
0 is the further offset

in effect more than 32 bits to address the inode

inode = iget(sb, block)
fetch an inode from the inode cache
else construct an inode
iget thinks udf_read_inode read data
but actually private data
UDF_I_LOCATION reserved -1
no I haven't read yet
bail on trouble

catch haven't read yet via UDF_I_LOCATION == -1
correct UDF_I_LOCATION then

__udf_read_inode = wrapper udf_fill_inode

udf_read_ptagged of misc.c
from lbaddr
metadata reads verify
tag, location, checksum
tag version
descriptor crc
return identifier

like FE, EFE, USE
USE maybe worm

strategyType 4096 is more worm
linked list of pairs of inodes
one block just points
indirection blank left blank
write once out of order = mo

udf_fill_inode
struct inode * inode
struct buffer_head * bh
copy of data from disc
in the buffer cache

mount -o uid=,gid= happens here

i_nlink
counts hard links
as if . and .. existed
hard links and sym links in the spec
special files
i_size
i_mode
udf permissions to fit Unix per spec
map to ugo+rwx
i_blocks
always count 512 = (1 << 9)

convert between Linux and UDF time

update MAC time

UDF_I_ ... is inode private data
2.6 inode allocation pool
2.4 all inodes as big as the biggest
UDF_I_UNIQUE
incs from maintained max per spec
cf. `ls -lai`
we report lba
UDF_I_LENEATTR
e.g. block & character special files
offsets extent stuff afterwards
UDF_I_LENALLOC
i.e. all the extents
e.g. blocks that contain
SAD's, LAD's, etc.
tells you if you have room
for more data, for more AD's,
whatever

switch fe->icbTag.fileType
ICBTAG_DIRECTORY
S_IFDIR
i_nlink++
i_op, i_fop for dir
ICBTAG_FILE
i_data.a_ops for data in file or not
ext attr
we handle at least what we create
spec can be about as arbitrary as file
inline = in block
out of line = beyond block

super.c udf_put_super
cf. error_out of udf_fill_super
discs commonly have free or unallocated bitmaps, not the tables
for bitmaps
lazily fetch all blocks
into a pre-allocated array of pointers of buffer_head
vmalloc if > PAGE_SIZE else kmalloc

--- 12:51am
--- mount and unmount we have seen
--- now `ls /`

dir.c udf_readdir
with string name
with dentry of parent
/ came from super.c udf_fill_super
inode = udf_iget(sb, rootdir)
d_alloc_root

call back with names via filldir

f_pos is position in ext2 dir
something like
0 is .
1 is ..
byte offset >> 2

do_udf_readdir
inode_bmap
calling it with byte position
converted to a block offset
got some readahead in there
maybe each time stepping past a boundary

read FID via udf_fileident_read
gives up to two buffer heads
start == end means one
if split across block copy into
char fname[UDF_NAME_LEN]

cope with osta compressed uni etc.
default utf-8
or mount with nls
udf_get_filename
length either way
often smaller e.g. 16 to 8 bits

f_pos = nf_pos + 1
to leave room for .

--- 1:10pm
--- now `ls -lai /`

namei.c udf_lookup
with string name
with dentry of parent

. and .. never get this far

stat etc.
same iteration as dir.c udf_readdir
but succeed only if match

to dentry
called only when dentry not found

udf_find_entry
same flow as readdir
handling for deleted and hidden
udf_match
memcmp
when we get a match
udf_iget = construct inode & fills it in
in the inode cache based on 32 bit key
d_add = corresponding dentry

--- 1:22pm
--- now `cat small.file`

udf_bmap reads from offset of file
generic_block_bmap
aop udf_readpage

[cancel this thread]

--- 1:29pm
--- small file write

file.c udf_file_write

struct file *, writes some bytes to somewhere
two cases, data in inode or not
O_APPEND shoves pos to end always

if data in inode does not fit, convert to other case

udf_expand_file_adinicb

grab_cache_page
get PAGE_SIZE from file
supposedly locked

kmap(page) gives pinned data
udf spec requires zeroed beyond data

UDF_I_DATA
exists when data is together with metadata
we keep the data so we can write it
whenever we write the inode

inode->i_data.a_ops->writepage

if data in inode does fit

update UDF_I_LENALLOC

mm/filemap.c generic_file_write
returns dirty for mark_inode_dirty
maybe for atime if atime per inode use

generic_file_aio_write_nolock
__grab_cache_page
udf_adinicb_readpage maybe not
though yes of course implied by mmap
a_ops->prepare_write
kmap(page)
filemap_copy_from_user...
a_ops->commit_write
udf_adinicb_commit_write
memcpy(UDF_I_
inode->i_size +=

udf_adinicb_writepage
notified of page is dirty
sufficient to copy inode->i_size

--- 2:00pm

udf_bmap (also readpage & writepage reach udf_get_block for page)
lba from offset in file
ioctl
generic_block_bmap
udf_get_block
choose
rediscover existing
udf_block_map
inode_bmap
udf_get_block
else alloc
inode_getblk

udf_get_block

eloc is LAD or SAD
length
location

udf_get_lb_pblock to get lba

FLAG_VARCONV

inode_getblk

called for a single whole block
always overlap with at most one preexisting AD
that AD
alloc
prealloc (often 8 blocks)
hole

worst case is three AD out of one

want to find previous, affected, next = c - 1, c, c + 1
laarr = LA array
search first for previous and current
often want to merge with previous
c = !c to exchange roles during search
fix up to be ascending

throughout maintain:

startnum is usually two, is what we found
endnum is what have at the end
delete = (endnum < startnum)
insert = (startnum < endnum)

------------------

/// data = already R A

easy case is AD byte length is block multiple
can be
last byte length of the AD's may grow some
without changing the bitmap
exit here

[
udf having last AD not be multiple of block length
is alien
while open we grow to multiple
at close we shrink if need be
]

/// beyond end

(etype == -1) is we fell off the end of the list
add !R !A (= hole), pretend it existed before
so we can fall into !R !A case

want laarr to ascend

/// !R A = a preallocation

ready to write, be happy

/// else is !R !A = hole
we allocate
maybe out of sequence unless at end
try to be close

------------------ all cases but single block was R A found

udf_split_extents
have to split if we found more than one !R block
have to split into three if not first or last
grows endnum if need be
shift laarr as needed to stay ascending

#ifdef UDF_PREALLOCATE always is

udf_prealloc_extents
eight more if at end, then use up the eight

udf_merge_extents
spec requires min laarr
only allowed contiguous extents
are a series of max length in blocks til end

udf_update_extents
inserting an extent is shift right
deleting an extent is shift left
ok to leave NE pre-allocated
except phg fsck rejects AED of zero
maybe readers/ we
choke over AD of zero
sequencing
delete if need be
udf_delete_aext
shift by one, again if need
leave behind empty except for NE
maybe
insert if need be
overwrite what remains

------------------ all cases but single block was R A found

udf_get_pblock
called after allocations complete

--- 3:00pm

--- 4:19pm

namei.c udf_write_fi

udf: dir is E/FE followed by FID's
FID's may cross block boundaries except for the initial tag

called whenever the FID changes
updates impuse fileident only on request

with

struct udf_fileident_bh * fibh
split across two possibly discontiguous blocks
starts in the first, ends in the second
split ends first at end
starts second at start

struct udf_fileIdentDesc *
cfi
all of the FID up to the name or so
sfi
is the fragment of the FID
found in the fibh

fits in the one, fits the other, or crosses
crosses goes into sfi and fibh->ebh
some values are negative

roughly in sequence
store *impuse
store *fileident
store alignment pad of the *fileident
calc udf_crc
start with 0, "add" all the pieces
store descriptor
in the other or crosses
can't be only the other
dirty inode if with inode, else just dirty data

namei.c udf_add_entry

add dentry to dir
char conversion
search like udf read dir or lookup
looking for a blank or the end
reuse only if same size within padding
refuse to duplicate
udf_expand_dir_inicb
comparable to file expansion
fi is where the new thing goes
construct the new thing
udf_new_tag
UDF_I_LENALLOC maintenance

namei.c udf_delete_entry
FID_FILE_CHAR_DELETED
UDF_FLAG_STRICT on by default vs. mount -o nostrict maybe

namei.c udf_create
udf_new_inode
udf_add_entry
cfi -> UDF_I_LOCATION(inode)
48 bits address of FE of inode
constructed by us as UDF_I_ when we construct inode
udf_write_fi
FID name already good from udf_add_entry
FID impuse not in use here
d_instantiate(dentry, inode);

namei.c udf_mknod
resembles udf_create but
also init_special_inode

namei.c udf_mkdir
resembles udf_create (includes parent to child) but
also link child to parent

namei.c empty_dir
test that a dir is empty

namei.c udf_rmdir is just unlink
udf_delete_entry
complain if nlink not two
hard link of dir doesn't happen
kernel will delete content of dir later
because nlink gone to zero

namei.c udf_unlink
opposite of udf_create

namei.c udf_symlink
create udf symlink
is a list of path components
/
.
..
other

one (or more) / is our path component separator

data with the inode usually, or not
inode creation can ask not ad in icb
based on mount flags
but then we only handle one block

char conversion

no check to see if symlink fits where it's going

namei.c create a hard link

namei.c udf_rename
find the two and do the work

old file to new file
old file to old dir
old dir to new dir
have to rework ..
old dir to old dir
have to rework ..

--- 5:21pm

ls *.h *.c

symlink.c

udf_pc_to_char
get path string from list of path components
udf_symlink_filler
fetch path string from the inode

fsync.c
udf_fsync_file is way of calling udf_fsync_inode
udf_fsync_inode writes (if dirty) beyond
udf_sync_inode writes just the block of the FE

balloc.c

should be able to fill disc

prealloc version
memscan looks for 8 from start, wraps at end
falls back something smaller
pick up the contiguous free before
comparable to ext2 without much of a group idea
locality within the block of the bitmap is the group

table is a directory of free blocks

udf_prealloc_blocks
picks the right kind

udf spec maybe says
UNALLOC_BITMAP normal when there is only one
in addition FREED_BITMAP
MO allows erase but erase is expensive

file.c
udf_release_file
release the prealloc
may make last AD length not block multiple

--- 5:55pm

truncate.c

udf_discard_prealloc called by iput
discard the trailing prealloc always
i_size may end at a block a little before the end
do nothing more if so
before maybe someone else owned isem
de/alloc of data involves lock_super
maybe now never happens
else truncate the last AD to byte length

udf_truncate_extents
shrink or grow
shrink
discard past i_size
grow
makes hole

extent_trunc
shrink any extent, maybe to zero

--- 6:15pm

inode.c

for fs/udf/, dir not thru the page cache

above

udf_iget = from cache else udf_fill_inode

udf_bmap = reads from offset of file
udf_get_block = guts of udf_bmap
inode_getblk = guts of udf_get_block if it doesn't exist

from the top

udf_put_inode
udf_discard_prealloc within i_sem
lock_kernel gone when you block
udf_delete_inode
i_size = 0;
udf_truncate
udf_update_inode
size 0 remains on disc
udf_free_inode

udf_clear_inode
discard UDF_I_DATA on request

udf_writepage
udf_readpage
udf_prepare_write
all generic ways of calling udf_get_block

udf_expand_file_adinicb
see above

udf_expand_dir_adinicb
like udf_expand_file_adinicb except
no page cache
also each FID points to its own lba

udf_get_block
see above

udf_getblk - layer 2
calls udf_get_block
gets a buffer head told to be there and zeroed and dirty

inode_getblk
udf_split_extents
udf_prealloc_extents
udf_merge_extents
udf_update_extents
see above

udf_bread to read FID's of E/FE - layer 3 - called by namei.c
udf_getblk
ll_rw_block(READ
wait_on_buffer

udf_truncate
shrinks or grows a file
embedded in the inode or not
may call
block_truncate_page
maybe throws away
associated pages without flushing
udf_truncate_extents

__udf_read_inode

udf_fill_inode

udf_convert_permissions
Linux perms vs. UDF perms

udf_write_inode - call udf_update_inode differently

udf_sync_inode - call udf_update_inode differently

udf_update_inode - sometime after mark_inode_dirty before umount
udf_tread
actually read it, then zero it
constructs a complete entirely fresh UDF E/FE
UDF_I_DATA preconstructed at this time
memcpy by E/FE
included in CRC
udf_iget
see above

udf_add_aext
appends an LAD, or SAD, or NE to AED, ...
wraps udf_write_aext ...

udf_write_aext - core - called by udf_add_aext and much else
actually constructs LAD/ SAD

udf_next_aext - core - called wherever
wraps udf_current_aext
follows thru NE

udf_current_aext - core - called wherever

udf_insert_aext
udf_delete_aext
shifters

inode_bmap

discover the aext that references a byte
which extent, which block of extent, ...

think udf_bmap
but no page cache etc.
loops udf_next_aext
setups args to call other funcs to keep going

from above remember:

udf_bmap
udf_get_block
udf_block_map
inode_bmap
udf_get_block

also called by do_udf_readdir

trace read/ write of metadata

trace read
data in file found by udf_get_block
metadata via udf_bread & variants
super block and E/FE and NE read by udf_tread

trace write - not easy

trace dirty - annoyingly frequent

mark_buffer_dirty
mark_inode_dirty

lba's
udf_get_lb_pblock & variations
i_ino
2.2.6.4
xpdf page 35 of 165 in udf250.pdf = "page 29"

Disclaimer of Warranty

Iomega Corp. Disclaimer of Warranty.

This information is provided above on an "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, either express or implied, including, without limitation, any warranties of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using the information and assume any risks associated with your use of such information. By using the information or software, you agree to assume any risks of the information being defective or inadequate to suit your needs, and you agree to waive any and all rights to make any claim whatsoever related to this information, including but not limited to any claim for damages (whether general or special, consequential, punitive, or for lost profits or any other damages), equitable relief, or loss of data. By using the information, you agree that you have not received any other representations or promises related to the information besides the information stated in this provision. In addition, this document does not grant you any additional rights or representations with regard to the underlying software described herein; any use of such software is governed by a separate, applicable license, your acceptance of which is a pre-condition to your use of such software.