1. 29 Apr, 2013 3 commits
  2. 25 Apr, 2013 1 commit
  3. 27 Mar, 2013 1 commit
    • Jan Kara's avatar
      jbd: don't wait (forever) for stale tid caused by wraparound · e678a4f0
      Jan Kara authored
      In the case where an inode has a very stale transaction id (tid) in
      i_datasync_tid or i_sync_tid, it's possible that after a very large
      (2**31) number of transactions, that the tid number space might wrap,
      causing tid_geq()'s calculations to fail.
      
      Commit d9b01934 "jbd: fix fsync() tid wraparound bug" attempted to fix
      this problem, but it only avoided kjournald spinning forever by fixing
      the logic in jbd_log_start_commit().
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      e678a4f0
  4. 14 Jan, 2013 1 commit
    • Eric Sandeen's avatar
      jbd: don't wake kjournald unnecessarily · 7e2fb2d7
      Eric Sandeen authored
      Don't send an extra wakeup to kjournald in the case where we
      already have the proper target in j_commit_request, i.e. that
      commit has already been requested for commit.
      
      commit d9b01934 "jbd: fix fsync() tid wraparound bug" changed
      the logic leading to a wakeup, but it caused some extra wakeups
      which were found to lead to a measurable performance regression.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      7e2fb2d7
  5. 23 Nov, 2012 1 commit
    • Jan Kara's avatar
      jbd: Fix lock ordering bug in journal_unmap_buffer() · 25389bb2
      Jan Kara authored
      Commit 09e05d48 introduced a wait for transaction commit into
      journal_unmap_buffer() in the case we are truncating a buffer undergoing commit
      in the page stradding i_size on a filesystem with blocksize < pagesize. Sadly
      we forgot to drop buffer lock before waiting for transaction commit and thus
      deadlock is possible when kjournald wants to lock the buffer.
      
      Fix the problem by dropping the buffer lock before waiting for transaction
      commit. Since we are still holding page lock (and that is OK), buffer cannot
      disappear under us.
      
      CC: stable@vger.kernel.org # Wherever commit 09e05d48 was taken
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      25389bb2
  6. 19 Nov, 2012 1 commit
  7. 12 Sep, 2012 1 commit
    • Jan Kara's avatar
      jbd: Fix assertion failure in commit code due to lacking transaction credits · 09e05d48
      Jan Kara authored
      ext3 users of data=journal mode with blocksize < pagesize were occasionally
      hitting assertion failure in journal_commit_transaction() checking whether the
      transaction has at least as many credits reserved as buffers attached.  The
      core of the problem is that when a file gets truncated, buffers that still need
      checkpointing or that are attached to the committing transaction are left with
      buffer_mapped set. When this happens to buffers beyond i_size attached to a
      page stradding i_size, subsequent write extending the file will see these
      buffers and as they are mapped (but underlying blocks were freed) things go
      awry from here.
      
      The assertion failure just coincidentally (and in this case luckily as we would
      start corrupting filesystem) triggers due to journal_head not being properly
      cleaned up as well.
      
      Under some rare circumstances this bug could even hit data=ordered mode users.
      There the assertion won't trigger and we would end up corrupting the
      filesystem.
      
      We fix the problem by unmapping buffers if possible (in lots of cases we just
      need a buffer attached to a transaction as a place holder but it must not be
      written out anyway). And in one case, we just have to bite the bullet and wait
      for transaction commit to finish.
      Reviewed-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      09e05d48
  8. 15 Aug, 2012 1 commit
    • Jan Kara's avatar
      jbd: don't write superblock when unmounting an ro filesystem · 2e84f264
      Jan Kara authored
      This sequence:
      
      results in an IO error when unmounting the RO filesystem. The bug was
      introduced by:
      
      commit 9754e39c
      Author: Jan Kara <jack@suse.cz>
      Date:   Sat Apr 7 12:33:03 2012 +0200
      
          jbd: Split updating of journal superblock and marking journal empty
      
      which lost some of the magic in journal_update_superblock() which
      used to test for a journal with no outstanding transactions.
      
      This is a port of a jbd2 fix by Eric Sandeen.
      
      CC: <stable@vger.kernel.org> # 3.4.x
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      2e84f264
  9. 04 Aug, 2012 1 commit
  10. 09 Jul, 2012 1 commit
  11. 15 May, 2012 3 commits
    • Jan Kara's avatar
      jbd: Write journal superblock with WRITE_FUA after checkpointing · fd2cbd4d
      Jan Kara authored
      If journal superblock is written only in disk's caches and other transaction
      starts reusing space of the transaction cleaned from the log, it can happen
      blocks of a new transaction reach the disk before journal superblock. When
      power failure happens in such case, subsequent journal replay would still try
      to replay the old transaction but some of it's blocks may be already
      overwritten by the new transaction. For this reason we must use WRITE_FUA when
      updating log tail and we must first write new log tail to disk and update
      in-memory information only after that.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      fd2cbd4d
    • Jan Kara's avatar
      jbd: protect all log tail updates with j_checkpoint_mutex · 1ce8486d
      Jan Kara authored
      There are some log tail updates that are not protected by j_checkpoint_mutex.
      Some of these are harmless because they happen during startup or shutdown but
      updates in journal_commit_transaction() and journal_flush() can really race
      with other log tail updates (e.g. someone doing journal_flush() with someone
      running cleanup_journal_tail()). So protect all log tail updates with
      j_checkpoint_mutex.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      1ce8486d
    • Jan Kara's avatar
      jbd: Split updating of journal superblock and marking journal empty · 9754e39c
      Jan Kara authored
      There are three case of updating journal superblock. In the first case, we want
      to mark journal as empty (setting s_sequence to 0), in the second case we want
      to update log tail, in the third case we want to update s_errno. Split these
      cases into separate functions. It makes the code slightly more straightforward
      and later patches will make the distinction even more important.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      9754e39c
  12. 11 Apr, 2012 1 commit
    • Jan Kara's avatar
      jbd: Refine commit writeout logic · 2db938be
      Jan Kara authored
      Currently we write out all journal buffers in WRITE_SYNC mode. This improves
      performance for fsync heavy workloads but hinders performance when writes
      are mostly asynchronous, most noticably it slows down readers and users
      complain about slow desktop response etc.
      
      So submit writes as asynchronous in the normal case and only submit writes as
      WRITE_SYNC if we detect someone is waiting for current transaction commit.
      
      I've gathered some numbers to back this change. The first is the read latency
      test. It measures time to read 1 MB after several seconds of sleeping in
      presence of streaming writes.
      
      Top 10 times (out of 90) in us:
      Before		After
      2131586		697473
      1709932		557487
      1564598		535642
      1480462		347573
      1478579		323153
      1408496		222181
      1388960		181273
      1329565		181070
      1252486		172832
      1223265		172278
      
      Average:
      619377		82180
      
      So the improvement in both maximum and average latency is massive.
      
      I've measured fsync throughput by:
      fs_mark -n 100 -t 1 -s 16384 -d /mnt/fsync/ -S 1 -L 4
      
      in presence of streaming reader. The numbers (fsyncs/s) are:
      Before		After
      9.9		6.3
      6.8		6.0
      6.3		6.2
      5.8		6.1
      
      So fsync performance seems unharmed by this change.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      2db938be
  13. 20 Mar, 2012 1 commit
  14. 13 Mar, 2012 1 commit
  15. 11 Jan, 2012 1 commit
    • Jan Kara's avatar
      jbd: Issue cache flush after checkpointing · 353b67d8
      Jan Kara authored
      When we reach cleanup_journal_tail(), there is no guarantee that
      checkpointed buffers are on a stable storage - especially if buffers were
      written out by log_do_checkpoint(), they are likely to be only in disk's
      caches. Thus when we update journal superblock, effectively removing old
      transaction from journal, this write of superblock can get to stable storage
      before those checkpointed buffers which can result in filesystem corruption
      after a crash.
      
      A similar problem can happen if we replay the journal and wipe it before
      flushing disk's caches.
      
      Thus we must unconditionally issue a cache flush before we update journal
      superblock in these cases. The fix is slightly complicated by the fact that we
      have to get log tail before we issue cache flush but we can store it in the
      journal superblock only after the cache flush. Otherwise we risk races where
      new tail is written before appropriate cache flush is finished.
      
      I managed to reproduce the corruption using somewhat tweaked Chris Mason's
      barrier-test scheduler. Also this should fix occasional reports of 'Bit already
      freed' filesystem errors which are totally unreproducible but inspection of
      several fs images I've gathered over time points to a problem like this.
      
      CC: stable@kernel.org
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      353b67d8
  16. 09 Jan, 2012 1 commit
    • Jan Kara's avatar
      jbd: Remove j_barrier mutex · 00482785
      Jan Kara authored
      j_barrier mutex is used for serializing different journal lock operations.  The
      problem with it is that e.g. FIFREEZE ioctl results in process leaving kernel
      with j_barrier mutex held which makes lockdep freak out. Also hibernation code
      wants to freeze filesystem but it cannot do so because it then cannot hibernate
      the system because of mutex being locked.
      
      So we remove j_barrier mutex and use direct wait on j_barrier_count instead.
      Since locking journal is a rare operation we don't have to care about fairness
      or such things.
      
      CC: Andrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarJoel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      00482785
  17. 06 Dec, 2011 1 commit
  18. 22 Nov, 2011 1 commit
    • Yongqiang Yang's avatar
      jbd: clear revoked flag on buffers before a new transaction started · 8c111b3f
      Yongqiang Yang authored
      Currently, we clear revoked flag only when a block is reused.  However,
      this can tigger a false journal error.  Consider a situation when a block
      is used as a meta block and is deleted(revoked) in ordered mode, then the
      block is allocated as a data block to a file.  At this moment, user changes
      the file's journal mode from ordered to journaled and truncates the file.
      The block will be considered re-revoked by journal because it has revoked
      flag still pending from the last transaction and an assertion triggers.
      
      We fix the problem by keeping the revoked status more uptodate - we clear
      revoked flag when switching revoke tables to reflect there is no revoked
      buffers in current transaction any more.
      Signed-off-by: default avatarYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      8c111b3f
  19. 21 Nov, 2011 1 commit
    • Tejun Heo's avatar
      freezer: unexport refrigerator() and update try_to_freeze() slightly · a0acae0e
      Tejun Heo authored
      There is no reason to export two functions for entering the
      refrigerator.  Calling refrigerator() instead of try_to_freeze()
      doesn't save anything noticeable or removes any race condition.
      
      * Rename refrigerator() to __refrigerator() and make it return bool
        indicating whether it scheduled out for freezing.
      
      * Update try_to_freeze() to return bool and relay the return value of
        __refrigerator() if freezing().
      
      * Convert all refrigerator() users to try_to_freeze().
      
      * Update documentation accordingly.
      
      * While at it, add might_sleep() to try_to_freeze().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Samuel Ortiz <samuel@sortiz.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Christoph Hellwig <hch@infradead.org>
      a0acae0e
  20. 01 Nov, 2011 1 commit
    • Eryu Guan's avatar
      jbd/jbd2: validate sb->s_first in journal_get_superblock() · 8762202d
      Eryu Guan authored
      I hit a J_ASSERT(blocknr != 0) failure in cleanup_journal_tail() when
      mounting a fsfuzzed ext3 image. It turns out that the corrupted ext3
      image has s_first = 0 in journal superblock, and the 0 is passed to
      journal->j_head in journal_reset(), then to blocknr in
      cleanup_journal_tail(), in the end the J_ASSERT failed.
      
      So validate s_first after reading journal superblock from disk in
      journal_get_superblock() to ensure s_first is valid.
      
      The following script could reproduce it:
      
      fstype=ext3
      blocksize=1024
      img=$fstype.img
      offset=0
      found=0
      magic="c0 3b 39 98"
      
      dd if=/dev/zero of=$img bs=1M count=8
      mkfs -t $fstype -b $blocksize -F $img
      filesize=`stat -c %s $img`
      while [ $offset -lt $filesize ]
      do
              if od -j $offset -N 4 -t x1 $img | grep -i "$magic";then
                      echo "Found journal: $offset"
                      found=1
                      break
              fi
              offset=`echo "$offset+$blocksize" | bc`
      done
      
      if [ $found -ne 1 ];then
              echo "Magic \"$magic\" not found"
              exit 1
      fi
      
      dd if=/dev/zero of=$img seek=$(($offset+23)) conv=notrunc bs=1 count=1
      
      mkdir -p ./mnt
      mount -o loop $img ./mnt
      
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarEryu Guan <guaneryu@gmail.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      8762202d
  21. 27 Jun, 2011 2 commits
    • Tao Ma's avatar
      jbd: Use WRITE_SYNC in journal checkpoint. · a212d1a7
      Tao Ma authored
      In journal checkpoint, we write the buffer and wait for its finish.
      But in cfq, the async queue has a very low priority, and in our test,
      if there are too many sync queues and every queue is filled up with
      requests, and the process will hang waiting for the log space.
      
      So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
      be moved into sync queue and handled by cfq timely. We also use the new plug,
      sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.
      Reported-by: default avatarRobin Dong <sanbai@taobao.com>
      Signed-off-by: default avatarTao Ma <boyu.mt@taobao.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      a212d1a7
    • Jan Kara's avatar
      jbd: Fix oops in journal_remove_journal_head() · bb189247
      Jan Kara authored
      journal_remove_journal_head() can oops when trying to access journal_head
      returned by bh2jh(). This is caused for example by the following race:
      
      	TASK1					TASK2
        journal_commit_transaction()
          ...
          processing t_forget list
            __journal_refile_buffer(jh);
            if (!jh->b_transaction) {
              jbd_unlock_bh_state(bh);
      					journal_try_to_free_buffers()
      					  journal_grab_journal_head(bh)
      					  jbd_lock_bh_state(bh)
      					  __journal_try_to_free_buffer()
      					  journal_put_journal_head(jh)
              journal_remove_journal_head(bh);
      
      journal_put_journal_head() in TASK2 sees that b_jcount == 0 and buffer is not
      part of any transaction and thus frees journal_head before TASK1 gets to doing
      so. Note that even buffer_head can be released by try_to_free_buffers() after
      journal_put_journal_head() which adds even larger opportunity for oops (but I
      didn't see this happen in reality).
      
      Fix the problem by making transactions hold their own journal_head reference
      (in b_jcount). That way we don't have to remove journal_head explicitely via
      journal_remove_journal_head() and instead just remove journal_head when
      b_jcount drops to zero. The result of this is that [__]journal_refile_buffer(),
      [__]journal_unfile_buffer(), and __journal_remove_checkpoint() can free
      journal_head which needs modification of a few callers. Also we have to be
      careful because once journal_head is removed, buffer_head might be freed as
      well. So we have to get our own buffer_head reference where it matters.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      bb189247
  22. 25 Jun, 2011 3 commits
    • Ding Dinghua's avatar
      jbd: fix a bug of leaking jh->b_jcount · bd5c9e18
      Ding Dinghua authored
      journal_get_create_access should drop jh->b_jcount in error handling path
      Signed-off-by: default avatarDing Dinghua <dingdinghua@nrchpc.ac.cn>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      bd5c9e18
    • Jan Kara's avatar
      jbd: remove dependency on __GFP_NOFAIL · 05713082
      Jan Kara authored
      The callers of start_this_handle() (or better ext3_journal_start()) are not
      really prepared to handle allocation failures. Such failures can for example
      result in silent data loss when it happens in ext3_..._writepage().  OTOH
      __GFP_NOFAIL is going away so we just retry allocation in start_this_handle().
      
      This loop is potentially dangerous because the oom killer cannot be invoked
      for GFP_NOFS allocation, so there is a potential for infinitely looping.
      But still this is better than silent data loss.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      05713082
    • Lukas Czerner's avatar
      jbd: Add fixed tracepoints · 99cb1a31
      Lukas Czerner authored
      This commit adds fixed tracepoint for jbd. It has been based on fixed
      tracepoints for jbd2, however there are missing those for collecting
      statistics, since I think that it will require more intrusive patch so I
      should have its own commit, if someone decide that it is needed. Also
      there are new tracepoints in __journal_drop_transaction() and
      journal_update_superblock().
      
      The list of jbd tracepoints:
      
      jbd_checkpoint
      jbd_start_commit
      jbd_commit_locking
      jbd_commit_flushing
      jbd_commit_logging
      jbd_drop_transaction
      jbd_end_commit
      jbd_do_submit_data
      jbd_cleanup_journal_tail
      jbd_update_superblock_end
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      99cb1a31
  23. 23 May, 2011 1 commit
  24. 17 May, 2011 3 commits
    • Tao Ma's avatar
      jbd/jbd2: remove obsolete summarise_journal_usage. · 9199e665
      Tao Ma authored
      summarise_journal_usage seems to be obsolete for a long time,
      so remove it.
      
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarTao Ma <boyu.mt@taobao.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      9199e665
    • Jan Kara's avatar
      jbd: Fix forever sleeping process in do_get_write_access() · 2842bb20
      Jan Kara authored
      In do_get_write_access() we wait on BH_Unshadow bit for buffer to get
      from shadow state. The waking code in journal_commit_transaction() has
      a bug because it does not issue a memory barrier after the buffer is moved
      from the shadow state and before wake_up_bit() is called. Thus a waitqueue
      check can happen before the buffer is actually moved from the shadow state
      and waiting process may never be woken. Fix the problem by issuing proper
      barrier.
      
      CC: stable@kernel.org
      Reported-by: default avatarTao Ma <boyu.mt@taobao.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      2842bb20
    • Ted Ts'o's avatar
      jbd: fix fsync() tid wraparound bug · d9b01934
      Ted Ts'o authored
      If an application program does not make any changes to the indirect
      blocks or extent tree, i_datasync_tid will not get updated.  If there
      are enough commits (i.e., 2**31) such that tid_geq()'s calculations
      wrap, and there isn't a currently active transaction at the time of
      the fdatasync() call, this can end up triggering a BUG_ON in
      fs/jbd/commit.c:
      
      	J_ASSERT(journal->j_running_transaction != NULL);
      
      It's pretty rare that this can happen, since it requires the use of
      fdatasync() plus *very* frequent and excessive use of fsync().  But
      with the right workload, it can.
      
      We fix this by replacing the use of tid_geq() with an equality test,
      since there's only one valid transaction id that is valid for us to
      start: namely, the currently running transaction (if it exists).
      
      CC: stable@kernel.org
      Reported-by: Martin_Zielinski@McAfee.com
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      d9b01934
  25. 31 Mar, 2011 1 commit
  26. 17 Mar, 2011 1 commit
  27. 10 Mar, 2011 1 commit
    • Jens Axboe's avatar
      block: kill off REQ_UNPLUG · 721a9602
      Jens Axboe authored
      With the plugging now being explicitly controlled by the
      submitter, callers need not pass down unplugging hints
      to the block layer. If they want to unplug, it's because they
      manually plugged on their own - in which case, they should just
      unplug at will.
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      721a9602
  28. 28 Feb, 2011 1 commit
  29. 10 Dec, 2010 1 commit
  30. 27 Oct, 2010 2 commits