1. 17 Sep, 2014 40 commits
    • Ilya Dryomov's avatar
      libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly · a6489727
      Ilya Dryomov authored
      commit 5f740d7e1531099b888410e6bab13f68da9b1a4d upstream.
      
      Determining ->last_piece based on the value of ->page_offset + length
      is incorrect because length here is the length of the entire message.
      ->last_piece set to false even if page array data item length is <=
      PAGE_SIZE, which results in invalid length passed to
      ceph_tcp_{send,recv}page() and causes various asserts to fire.
      
          # cat pages-cursor-init.sh
          #!/bin/bash
          rbd create --size 10 --image-format 2 foo
          FOO_DEV=$(rbd map foo)
          dd if=/dev/urandom of=$FOO_DEV bs=1M &>/dev/null
          rbd snap create foo@snap
          rbd snap protect foo@snap
          rbd clone foo@snap bar
          # rbd_resize calls librbd rbd_resize(), size is in bytes
          ./rbd_resize bar $(((4 << 20) + 512))
          rbd resize --size 10 bar
          BAR_DEV=$(rbd map bar)
          # trigger a 512-byte copyup -- 512-byte page array data item
          dd if=/dev/urandom of=$BAR_DEV bs=1M count=1 seek=5
      
      The problem exists only in ceph_msg_data_pages_cursor_init(),
      ceph_msg_data_pages_advance() does the right thing.  The size_t cast is
      unnecessary.
      Signed-off-by: default avatarIlya Dryomov <ilya.dryomov@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a6489727
    • NeilBrown's avatar
      md/raid1,raid10: always abort recover on write error. · b08633de
      NeilBrown authored
      commit 2446dba03f9dabe0b477a126cbeb377854785b47 upstream.
      
      Currently we don't abort recovery on a write error if the write error
      to the recovering device was triggerd by normal IO (as opposed to
      recovery IO).
      
      This means that for one bitmap region, the recovery might write to the
      recovering device for a few sectors, then not bother for subsequent
      sectors (as it never writes to failed devices).  In this case
      the bitmap bit will be cleared, but it really shouldn't.
      
      The result is that if the recovering device fails and is then re-added
      (after fixing whatever hardware problem triggerred the failure),
      the second recovery won't redo the region it was in the middle of,
      so some of the device will not be recovered properly.
      
      If we abort the recovery, the region being processes will be cancelled
      (bit not cleared) and the whole region will be retried.
      
      As the bug can result in data corruption the patch is suitable for
      -stable.  For kernels prior to 3.11 there is a conflict in raid10.c
      which will require care.
      
      Original-from: jiao hui <jiaohui@bwstor.com.cn>
      Reported-and-tested-by: default avatarjiao hui <jiaohui@bwstor.com.cn>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      b08633de
    • Chris Mason's avatar
      xfs: don't zero partial page cache pages during O_DIRECT writes · d96dbb06
      Chris Mason authored
      commit 85e584da3212140ee80fd047f9058bbee0bc00d5 upstream.
      
      xfs is using truncate_pagecache_range to invalidate the page cache
      during DIO reads.  This is different from the other filesystems who
      only invalidate pages during DIO writes.
      
      truncate_pagecache_range is meant to be used when we are freeing the
      underlying data structs from disk, so it will zero any partial
      ranges in the page.  This means a DIO read can zero out part of the
      page cache page, and it is possible the page will stay in cache.
      
      buffered reads will find an up to date page with zeros instead of
      the data actually on disk.
      
      This patch fixes things by using invalidate_inode_pages2_range
      instead.  It preserves the page cache invalidation, but won't zero
      any pages.
      
      [dchinner: catch error and warn if it fails. Comment.]
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d96dbb06
    • Dave Chinner's avatar
      xfs: don't zero partial page cache pages during O_DIRECT writes · 1025b461
      Dave Chinner authored
      commit 834ffca6f7e345a79f6f2e2d131b0dfba8a4b67a upstream.
      
      Similar to direct IO reads, direct IO writes are using
      truncate_pagecache_range to invalidate the page cache. This is
      incorrect due to the sub-block zeroing in the page cache that
      truncate_pagecache_range() triggers.
      
      This patch fixes things by using invalidate_inode_pages2_range
      instead.  It preserves the page cache invalidation, but won't zero
      any pages.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1025b461
    • Dave Chinner's avatar
      xfs: don't dirty buffers beyond EOF · 3430681f
      Dave Chinner authored
      commit 22e757a49cf010703fcb9c9b4ef793248c39b0c2 upstream.
      
      generic/263 is failing fsx at this point with a page spanning
      EOF that cannot be invalidated. The operations are:
      
      1190 mapwrite   0x52c00 thru    0x5e569 (0xb96a bytes)
      1191 mapread    0x5c000 thru    0x5d636 (0x1637 bytes)
      1192 write      0x5b600 thru    0x771ff (0x1bc00 bytes)
      
      where 1190 extents EOF from 0x54000 to 0x5e569. When the direct IO
      write attempts to invalidate the cached page over this range, it
      fails with -EBUSY and so any attempt to do page invalidation fails.
      
      The real question is this: Why can't that page be invalidated after
      it has been written to disk and cleaned?
      
      Well, there's data on the first two buffers in the page (1k block
      size, 4k page), but the third buffer on the page (i.e. beyond EOF)
      is failing drop_buffers because it's bh->b_state == 0x3, which is
      BH_Uptodate | BH_Dirty.  IOWs, there's dirty buffers beyond EOF. Say
      what?
      
      OK, set_buffer_dirty() is called on all buffers from
      __set_page_buffers_dirty(), regardless of whether the buffer is
      beyond EOF or not, which means that when we get to ->writepage,
      we have buffers marked dirty beyond EOF that we need to clean.
      So, we need to implement our own .set_page_dirty method that
      doesn't dirty buffers beyond EOF.
      
      This is messy because the buffer code is not meant to be shared
      and it has interesting locking issues on the buffer dirty bits.
      So just copy and paste it and then modify it to suit what we need.
      
      Note: the solutions the other filesystems and generic block code use
      of marking the buffers clean in ->writepage does not work for XFS.
      It still leaves dirty buffers beyond EOF and invalidations still
      fail. Hence rather than play whack-a-mole, this patch simply
      prevents those buffers from being dirtied in the first place.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3430681f
    • Dave Chinner's avatar
      xfs: quotacheck leaves dquot buffers without verifiers · 9a9237c9
      Dave Chinner authored
      commit 5fd364fee81a7888af806e42ed8a91c845894f2d upstream.
      
      When running xfs/305, I noticed that quotacheck was flushing dquot
      buffers that did not have the xfs_dquot_buf_ops verifiers attached:
      
      XFS (vdb): _xfs_buf_ioapply: no ops on block 0x1dc8/0x1dc8
      ffff880052489000: 44 51 01 04 00 00 65 b8 00 00 00 00 00 00 00 00  DQ....e.........
      ffff880052489010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      ffff880052489020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      ffff880052489030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      CPU: 1 PID: 2376 Comm: mount Not tainted 3.16.0-rc2-dgc+ #306
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       ffff88006fe38000 ffff88004a0ffae8 ffffffff81cf1cca 0000000000000001
       ffff88004a0ffb88 ffffffff814d50ca 000010004a0ffc70 0000000000000000
       ffff88006be56dc4 0000000000000021 0000000000001dc8 ffff88007c773d80
      Call Trace:
       [<ffffffff81cf1cca>] dump_stack+0x45/0x56
       [<ffffffff814d50ca>] _xfs_buf_ioapply+0x3ca/0x3d0
       [<ffffffff810db520>] ? wake_up_state+0x20/0x20
       [<ffffffff814d51f5>] ? xfs_bdstrat_cb+0x55/0xb0
       [<ffffffff814d513b>] xfs_buf_iorequest+0x6b/0xd0
       [<ffffffff814d51f5>] xfs_bdstrat_cb+0x55/0xb0
       [<ffffffff814d53ab>] __xfs_buf_delwri_submit+0x15b/0x220
       [<ffffffff814d6040>] ? xfs_buf_delwri_submit+0x30/0x90
       [<ffffffff814d6040>] xfs_buf_delwri_submit+0x30/0x90
       [<ffffffff8150f89d>] xfs_qm_quotacheck+0x17d/0x3c0
       [<ffffffff81510591>] xfs_qm_mount_quotas+0x151/0x1e0
       [<ffffffff814ed01c>] xfs_mountfs+0x56c/0x7d0
       [<ffffffff814f0f12>] xfs_fs_fill_super+0x2c2/0x340
       [<ffffffff811c9fe4>] mount_bdev+0x194/0x1d0
       [<ffffffff814f0c50>] ? xfs_finish_flags+0x170/0x170
       [<ffffffff814ef0f5>] xfs_fs_mount+0x15/0x20
       [<ffffffff811ca8c9>] mount_fs+0x39/0x1b0
       [<ffffffff811e4d67>] vfs_kern_mount+0x67/0x120
       [<ffffffff811e757e>] do_mount+0x23e/0xad0
       [<ffffffff8117abde>] ? __get_free_pages+0xe/0x50
       [<ffffffff811e71e6>] ? copy_mount_options+0x36/0x150
       [<ffffffff811e8103>] SyS_mount+0x83/0xc0
       [<ffffffff81cfd40b>] tracesys+0xdd/0xe2
      
      This was caused by dquot buffer readahead not attaching a verifier
      structure to the buffer when readahead was issued, resulting in the
      followup read of the buffer finding a valid buffer and so not
      attaching new verifiers to the buffer as part of the read.
      
      Also, when a verifier failure occurs, we then read the buffer
      without verifiers. Attach the verifiers manually after this read so
      that if the buffer is then written it will be verified that the
      corruption has been repaired.
      
      Further, when flushing a dquot we don't ask for a verifier when
      reading in the dquot buffer the dquot belongs to. Most of the time
      this isn't an issue because the buffer is still cached, but when it
      is not cached it will result in writing the dquot buffer without
      having the verfier attached.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a9237c9
    • Steve Wise's avatar
      RDMA/iwcm: Use a default listen backlog if needed · 433d80d6
      Steve Wise authored
      commit 2f0304d21867476394cd51a54e97f7273d112261 upstream.
      
      If the user creates a listening cm_id with backlog of 0 the IWCM ends
      up not allowing any connection requests at all.  The correct behavior
      is for the IWCM to pick a default value if the user backlog parameter
      is zero.
      
      Lustre from version 1.8.8 onward uses a backlog of 0, which breaks
      iwarp support without this fix.
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      433d80d6
    • NeilBrown's avatar
      md/raid10: Fix memory leak when raid10 reshape completes. · 26584e18
      NeilBrown authored
      commit b39685526f46976bcd13aa08c82480092befa46c upstream.
      
      When a raid10 commences a resync/recovery/reshape it allocates
      some buffer space.
      When a resync/recovery completes the buffer space is freed.  But not
      when the reshape completes.
      This can result in a small memory leak.
      
      There is a subtle side-effect of this bug.  When a RAID10 is reshaped
      to a larger array (more devices), the reshape is immediately followed
      by a "resync" of the new space.  This "resync" will use the buffer
      space which was allocated for "reshape".  This can cause problems
      including a "BUG" in the SCSI layer.  So this is suitable for -stable.
      
      Fixes: 3ea7daa5Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      26584e18
    • NeilBrown's avatar
      md/raid10: fix memory leak when reshaping a RAID10. · 1075d2bd
      NeilBrown authored
      commit ce0b0a46955d1bb389684a2605dbcaa990ba0154 upstream.
      
      raid10 reshape clears unwanted bits from a bio->bi_flags using
      a method which, while clumsy, worked until 3.10 when BIO_OWNS_VEC
      was added.
      Since then it clears that bit but shouldn't.  This results in a
      memory leak.
      
      So change to used the approved method of clearing unwanted bits.
      
      As this causes a memory leak which can consume all of memory
      the fix is suitable for -stable.
      
      Fixes: a38352e0
      Reported-by: mdraid.pkoch@dfgh.net (Peter Koch)
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1075d2bd
    • NeilBrown's avatar
      md/raid6: avoid data corruption during recovery of double-degraded RAID6 · 318a3d59
      NeilBrown authored
      commit 9c4bdf697c39805078392d5ddbbba5ae5680e0dd upstream.
      
      During recovery of a double-degraded RAID6 it is possible for
      some blocks not to be recovered properly, leading to corruption.
      
      If a write happens to one block in a stripe that would be written to a
      missing device, and at the same time that stripe is recovering data
      to the other missing device, then that recovered data may not be written.
      
      This patch skips, in the double-degraded case, an optimisation that is
      only safe for single-degraded arrays.
      
      Bug was introduced in 2.6.32 and fix is suitable for any kernel since
      then.  In an older kernel with separate handle_stripe5() and
      handle_stripe6() functions the patch must change handle_stripe6().
      
      Fixes: 6c0069c0
      Cc: Yuri Tikhonov <yur@emcraft.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Reported-by: default avatar"Manibalan P" <pmanibalan@amiindia.co.in>
      Tested-by: default avatar"Manibalan P" <pmanibalan@amiindia.co.in>
      Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1090423Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      318a3d59
    • Vignesh Raman's avatar
      Bluetooth: Avoid use of session socket after the session gets freed · 07b41b34
      Vignesh Raman authored
      commit 32333edb82fb2009980eefc5518100068147ab82 upstream.
      
      The commits 08c30aca "Bluetooth: Remove
      RFCOMM session refcnt" and 8ff52f7d
      "Bluetooth: Return RFCOMM session ptrs to avoid freed session"
      allow rfcomm_recv_ua and rfcomm_session_close to delete the session
      (and free the corresponding socket) and propagate NULL session pointer
      to the upper callers.
      
      Additional fix is required to terminate the loop in rfcomm_process_rx
      function to avoid use of freed 'sk' memory.
      
      The issue is only reproducible with kernel option CONFIG_PAGE_POISONING
      enabled making freed memory being changed and filled up with fixed char
      value used to unmask use-after-free issues.
      Signed-off-by: default avatarVignesh Raman <Vignesh_Raman@mentor.com>
      Signed-off-by: default avatarVitaly Kuzmichev <Vitaly_Kuzmichev@mentor.com>
      Acked-by: default avatarDean Jenkins <Dean_Jenkins@mentor.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07b41b34
    • Vladimir Davydov's avatar
      Bluetooth: never linger on process exit · 819f3e7a
      Vladimir Davydov authored
      commit 093facf3634da1b0c2cc7ed106f1983da901bbab upstream.
      
      If the current process is exiting, lingering on socket close will make
      it unkillable, so we should avoid it.
      
      Reproducer:
      
        #include <sys/types.h>
        #include <sys/socket.h>
      
        #define BTPROTO_L2CAP   0
        #define BTPROTO_SCO     2
        #define BTPROTO_RFCOMM  3
      
        int main()
        {
                int fd;
                struct linger ling;
      
                fd = socket(PF_BLUETOOTH, SOCK_STREAM, BTPROTO_RFCOMM);
                //or: fd = socket(PF_BLUETOOTH, SOCK_DGRAM, BTPROTO_L2CAP);
                //or: fd = socket(PF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_SCO);
      
                ling.l_onoff = 1;
                ling.l_linger = 1000000000;
                setsockopt(fd, SOL_SOCKET, SO_LINGER, &ling, sizeof(ling));
      
                return 0;
        }
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      819f3e7a
    • Eric W. Biederman's avatar
      mnt: Add tests for unprivileged remount cases that have found to be faulty · bbeed681
      Eric W. Biederman authored
      commit db181ce011e3c033328608299cd6fac06ea50130 upstream.
      
      Kenton Varda <kenton@sandstorm.io> discovered that by remounting a
      read-only bind mount read-only in a user namespace the
      MNT_LOCK_READONLY bit would be cleared, allowing an unprivileged user
      to the remount a read-only mount read-write.
      
      Upon review of the code in remount it was discovered that the code allowed
      nosuid, noexec, and nodev to be cleared.  It was also discovered that
      the code was allowing the per mount atime flags to be changed.
      
      The first naive patch to fix these issues contained the flaw that using
      default atime settings when remounting a filesystem could be disallowed.
      
      To avoid this problems in the future add tests to ensure unprivileged
      remounts are succeeding and failing at the appropriate times.
      Acked-by: default avatarSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bbeed681
    • Eric W. Biederman's avatar
      mnt: Change the default remount atime from relatime to the existing value · 99dd97b8
      Eric W. Biederman authored
      commit ffbc6f0ead47fa5a1dc9642b0331cb75c20a640e upstream.
      
      Since March 2009 the kernel has treated the state that if no
      MS_..ATIME flags are passed then the kernel defaults to relatime.
      
      Defaulting to relatime instead of the existing atime state during a
      remount is silly, and causes problems in practice for people who don't
      specify any MS_...ATIME flags and to get the default filesystem atime
      setting.  Those users may encounter a permission error because the
      default atime setting does not work.
      
      A default that does not work and causes permission problems is
      ridiculous, so preserve the existing value to have a default
      atime setting that is always guaranteed to work.
      
      Using the default atime setting in this way is particularly
      interesting for applications built to run in restricted userspace
      environments without /proc mounted, as the existing atime mount
      options of a filesystem can not be read from /proc/mounts.
      
      In practice this fixes user space that uses the default atime
      setting on remount that are broken by the permission checks
      keeping less privileged users from changing more privileged users
      atime settings.
      Acked-by: default avatarSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99dd97b8
    • Eric W. Biederman's avatar
      mnt: Correct permission checks in do_remount · 187985d9
      Eric W. Biederman authored
      commit 9566d6742852c527bf5af38af5cbb878dad75705 upstream.
      
      While invesgiating the issue where in "mount --bind -oremount,ro ..."
      would result in later "mount --bind -oremount,rw" succeeding even if
      the mount started off locked I realized that there are several
      additional mount flags that should be locked and are not.
      
      In particular MNT_NOSUID, MNT_NODEV, MNT_NOEXEC, and the atime
      flags in addition to MNT_READONLY should all be locked.  These
      flags are all per superblock, can all be changed with MS_BIND,
      and should not be changable if set by a more privileged user.
      
      The following additions to the current logic are added in this patch.
      - nosuid may not be clearable by a less privileged user.
      - nodev  may not be clearable by a less privielged user.
      - noexec may not be clearable by a less privileged user.
      - atime flags may not be changeable by a less privileged user.
      
      The logic with atime is that always setting atime on access is a
      global policy and backup software and auditing software could break if
      atime bits are not updated (when they are configured to be updated),
      and serious performance degradation could result (DOS attack) if atime
      updates happen when they have been explicitly disabled.  Therefore an
      unprivileged user should not be able to mess with the atime bits set
      by a more privileged user.
      
      The additional restrictions are implemented with the addition of
      MNT_LOCK_NOSUID, MNT_LOCK_NODEV, MNT_LOCK_NOEXEC, and MNT_LOCK_ATIME
      mnt flags.
      
      Taken together these changes and the fixes for MNT_LOCK_READONLY
      should make it safe for an unprivileged user to create a user
      namespace and to call "mount --bind -o remount,... ..." without
      the danger of mount flags being changed maliciously.
      Acked-by: default avatarSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      187985d9
    • Eric W. Biederman's avatar
      mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount · 81d4c13e
      Eric W. Biederman authored
      commit 07b645589dcda8b7a5249e096fece2a67556f0f4 upstream.
      
      There are no races as locked mount flags are guaranteed to never change.
      
      Moving the test into do_remount makes it more visible, and ensures all
      filesystem remounts pass the MNT_LOCK_READONLY permission check.  This
      second case is not an issue today as filesystem remounts are guarded
      by capable(CAP_DAC_ADMIN) and thus will always fail in less privileged
      mount namespaces, but it could become an issue in the future.
      Acked-by: default avatarSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81d4c13e
    • Eric W. Biederman's avatar
      mnt: Only change user settable mount flags in remount · 8c30f227
      Eric W. Biederman authored
      commit a6138db815df5ee542d848318e5dae681590fccd upstream.
      
      Kenton Varda <kenton@sandstorm.io> discovered that by remounting a
      read-only bind mount read-only in a user namespace the
      MNT_LOCK_READONLY bit would be cleared, allowing an unprivileged user
      to the remount a read-only mount read-write.
      
      Correct this by replacing the mask of mount flags to preserve
      with a mask of mount flags that may be changed, and preserve
      all others.   This ensures that any future bugs with this mask and
      remount will fail in an easy to detect way where new mount flags
      simply won't change.
      Acked-by: default avatarSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c30f227
    • Steven Rostedt (Red Hat)'s avatar
      ring-buffer: Up rb_iter_peek() loop count to 3 · 7f70b62e
      Steven Rostedt (Red Hat) authored
      commit 021de3d904b88b1771a3a2cfc5b75023c391e646 upstream.
      
      After writting a test to try to trigger the bug that caused the
      ring buffer iterator to become corrupted, I hit another bug:
      
       WARNING: CPU: 1 PID: 5281 at kernel/trace/ring_buffer.c:3766 rb_iter_peek+0x113/0x238()
       Modules linked in: ipt_MASQUERADE sunrpc [...]
       CPU: 1 PID: 5281 Comm: grep Tainted: G        W     3.16.0-rc3-test+ #143
       Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
        0000000000000000 ffffffff81809a80 ffffffff81503fb0 0000000000000000
        ffffffff81040ca1 ffff8800796d6010 ffffffff810c138d ffff8800796d6010
        ffff880077438c80 ffff8800796d6010 ffff88007abbe600 0000000000000003
       Call Trace:
        [<ffffffff81503fb0>] ? dump_stack+0x4a/0x75
        [<ffffffff81040ca1>] ? warn_slowpath_common+0x7e/0x97
        [<ffffffff810c138d>] ? rb_iter_peek+0x113/0x238
        [<ffffffff810c138d>] ? rb_iter_peek+0x113/0x238
        [<ffffffff810c14df>] ? ring_buffer_iter_peek+0x2d/0x5c
        [<ffffffff810c6f73>] ? tracing_iter_reset+0x6e/0x96
        [<ffffffff810c74a3>] ? s_start+0xd7/0x17b
        [<ffffffff8112b13e>] ? kmem_cache_alloc_trace+0xda/0xea
        [<ffffffff8114cf94>] ? seq_read+0x148/0x361
        [<ffffffff81132d98>] ? vfs_read+0x93/0xf1
        [<ffffffff81132f1b>] ? SyS_read+0x60/0x8e
        [<ffffffff8150bf9f>] ? tracesys+0xdd/0xe2
      
      Debugging this bug, which triggers when the rb_iter_peek() loops too
      many times (more than 2 times), I discovered there's a case that can
      cause that function to legitimately loop 3 times!
      
      rb_iter_peek() is different than rb_buffer_peek() as the rb_buffer_peek()
      only deals with the reader page (it's for consuming reads). The
      rb_iter_peek() is for traversing the buffer without consuming it, and as
      such, it can loop for one more reason. That is, if we hit the end of
      the reader page or any page, it will go to the next page and try again.
      
      That is, we have this:
      
       1. iter->head > iter->head_page->page->commit
          (rb_inc_iter() which moves the iter to the next page)
          try again
      
       2. event = rb_iter_head_event()
          event->type_len == RINGBUF_TYPE_TIME_EXTEND
          rb_advance_iter()
          try again
      
       3. read the event.
      
      But we never get to 3, because the count is greater than 2 and we
      cause the WARNING and return NULL.
      
      Up the counter to 3.
      
      Fixes: 69d1b839 "ring-buffer: Bind time extend and data events together"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f70b62e
    • Steven Rostedt (Red Hat)'s avatar
      ring-buffer: Always reset iterator to reader page · 814aa5ad
      Steven Rostedt (Red Hat) authored
      commit 651e22f2701b4113989237c3048d17337dd2185c upstream.
      
      When performing a consuming read, the ring buffer swaps out a
      page from the ring buffer with a empty page and this page that
      was swapped out becomes the new reader page. The reader page
      is owned by the reader and since it was swapped out of the ring
      buffer, writers do not have access to it (there's an exception
      to that rule, but it's out of scope for this commit).
      
      When reading the "trace" file, it is a non consuming read, which
      means that the data in the ring buffer will not be modified.
      When the trace file is opened, a ring buffer iterator is allocated
      and writes to the ring buffer are disabled, such that the iterator
      will not have issues iterating over the data.
      
      Although the ring buffer disabled writes, it does not disable other
      reads, or even consuming reads. If a consuming read happens, then
      the iterator is reset and starts reading from the beginning again.
      
      My tests would sometimes trigger this bug on my i386 box:
      
      WARNING: CPU: 0 PID: 5175 at kernel/trace/trace.c:1527 __trace_find_cmdline+0x66/0xaa()
      Modules linked in:
      CPU: 0 PID: 5175 Comm: grep Not tainted 3.16.0-rc3-test+ #8
      Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
       00000000 00000000 f09c9e1c c18796b3 c1b5d74c f09c9e4c c103a0e3 c1b5154b
       f09c9e78 00001437 c1b5d74c 000005f7 c10bd85a c10bd85a c1cac57c f09c9eb0
       ed0e0000 f09c9e64 c103a185 00000009 f09c9e5c c1b5154b f09c9e78 f09c9e80^M
      Call Trace:
       [<c18796b3>] dump_stack+0x4b/0x75
       [<c103a0e3>] warn_slowpath_common+0x7e/0x95
       [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa
       [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa
       [<c103a185>] warn_slowpath_fmt+0x33/0x35
       [<c10bd85a>] __trace_find_cmdline+0x66/0xaa^M
       [<c10bed04>] trace_find_cmdline+0x40/0x64
       [<c10c3c16>] trace_print_context+0x27/0xec
       [<c10c4360>] ? trace_seq_printf+0x37/0x5b
       [<c10c0b15>] print_trace_line+0x319/0x39b
       [<c10ba3fb>] ? ring_buffer_read+0x47/0x50
       [<c10c13b1>] s_show+0x192/0x1ab
       [<c10bfd9a>] ? s_next+0x5a/0x7c
       [<c112e76e>] seq_read+0x267/0x34c
       [<c1115a25>] vfs_read+0x8c/0xef
       [<c112e507>] ? seq_lseek+0x154/0x154
       [<c1115ba2>] SyS_read+0x54/0x7f
       [<c188488e>] syscall_call+0x7/0xb
      ---[ end trace 3f507febd6b4cc83 ]---
      >>>> ##### CPU 1 buffer started ####
      
      Which was the __trace_find_cmdline() function complaining about the pid
      in the event record being negative.
      
      After adding more test cases, this would trigger more often. Strangely
      enough, it would never trigger on a single test, but instead would trigger
      only when running all the tests. I believe that was the case because it
      required one of the tests to be shutting down via delayed instances while
      a new test started up.
      
      After spending several days debugging this, I found that it was caused by
      the iterator becoming corrupted. Debugging further, I found out why
      the iterator became corrupted. It happened with the rb_iter_reset().
      
      As consuming reads may not read the full reader page, and only part
      of it, there's a "read" field to know where the last read took place.
      The iterator, must also start at the read position. In the rb_iter_reset()
      code, if the reader page was disconnected from the ring buffer, the iterator
      would start at the head page within the ring buffer (where writes still
      happen). But the mistake there was that it still used the "read" field
      to start the iterator on the head page, where it should always start
      at zero because readers never read from within the ring buffer where
      writes occur.
      
      I originally wrote a patch to have it set the iter->head to 0 instead
      of iter->head_page->read, but then I questioned why it wasn't always
      setting the iter to point to the reader page, as the reader page is
      still valid.  The list_empty(reader_page->list) just means that it was
      successful in swapping out. But the reader_page may still have data.
      
      There was a bug report a long time ago that was not reproducible that
      had something about trace_pipe (consuming read) not matching trace
      (iterator read). This may explain why that happened.
      
      Anyway, the correct answer to this bug is to always use the reader page
      an not reset the iterator to inside the writable ring buffer.
      
      Fixes: d769041f "ring_buffer: implement new locking"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      814aa5ad
    • Jiri Kosina's avatar
      ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock · 4f6a1e62
      Jiri Kosina authored
      commit 6726655dfdd2dc60c035c690d9f10cb69d7ea075 upstream.
      
      There is a following AB-BA dependency between cpu_hotplug.lock and
      cpuidle_lock:
      
      1) cpu_hotplug.lock -> cpuidle_lock
      enable_nonboot_cpus()
       _cpu_up()
        cpu_hotplug_begin()
         LOCK(cpu_hotplug.lock)
       cpu_notify()
        ...
        acpi_processor_hotplug()
         cpuidle_pause_and_lock()
          LOCK(cpuidle_lock)
      
      2) cpuidle_lock -> cpu_hotplug.lock
      acpi_os_execute_deferred() workqueue
       ...
       acpi_processor_cst_has_changed()
        cpuidle_pause_and_lock()
         LOCK(cpuidle_lock)
        get_online_cpus()
         LOCK(cpu_hotplug.lock)
      
      Fix this by reversing the order acpi_processor_cst_has_changed() does
      thigs -- let it first execute the protection against CPU hotplug by
      calling get_online_cpus() and obtain the cpuidle lock only after that (and
      perform the symmentric change when allowing CPUs hotplug again and
      dropping cpuidle lock).
      
      Spotted by lockdep.
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f6a1e62
    • Lan Tianyu's avatar
      ACPI: Run fixed event device notifications in process context · c55d35d2
      Lan Tianyu authored
      commit 236105db632c6279a020f78c83e22eaef746006b upstream.
      
      Currently, notify callbacks for fixed button events are run from
      interrupt context.  That is not necessary and after commit 0bf6368ee8f2
      (ACPI / button: Add ACPI Button event via netlink routine) it causes
      netlink routines to be called from interrupt context which is not
      correct.
      
      Also, that is different from non-fixed device events (including
      non-fixed button events) whose notify callbacks are all executed from
      process context.
      
      For the above reasons, make fixed button device notify callbacks run
      in process context which will avoid the deadlock when using netlink
      to report button events to user space.
      
      Fixes: 0bf6368ee8f2 (ACPI / button: Add ACPI Button event via netlink routine)
      Link: https://lkml.org/lkml/2014/8/21/606Reported-by: default avatarBenjamin Block <bebl@mageta.org>
      Reported-by: default avatarKnut Petersen <Knut_Petersen@t-online.de>
      Signed-off-by: default avatarLan Tianyu <tianyu.lan@intel.com>
      [rjw: Function names, subject and changelog.]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c55d35d2
    • David E. Box's avatar
      ACPICA: Utilities: Fix memory leak in acpi_ut_copy_iobject_to_iobject · b3e98f0c
      David E. Box authored
      commit 8aa5e56eeb61a099ea6519eb30ee399e1bc043ce upstream.
      
      Adds return status check on copy routines to delete the allocated destination
      object if either copy fails. Reported by Colin Ian King on bugs.acpica.org,
      Bug 1087.
      The last applicable commit:
       Commit: 3371c19c
       Subject: ACPICA: Remove ACPI_GET_OBJECT_TYPE macro
      
      Link: https://bugs.acpica.org/show_bug.cgi?id=1087Reported-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid E. Box <david.e.box@linux.intel.com>
      Signed-off-by: default avatarBob Moore <robert.moore@intel.com>
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3e98f0c
    • Ben Hutchings's avatar
      bfa: Fix undefined bit shift on big-endian architectures with 32-bit DMA address · db065663
      Ben Hutchings authored
      commit 03a6c3ff3282ee9fa893089304d951e0be93a144 upstream.
      
      bfa_swap_words() shifts its argument (assumed to be 64-bit) by 32 bits
      each way.  In two places the argument type is dma_addr_t, which may be
      32-bit, in which case the effect of the bit shift is undefined:
      
      drivers/scsi/bfa/bfa_fcpim.c: In function 'bfa_ioim_send_ioreq':
      drivers/scsi/bfa/bfa_fcpim.c:2497:4: warning: left shift count >= width of type [enabled by default]
          addr = bfa_sgaddr_le(sg_dma_address(sg));
          ^
      drivers/scsi/bfa/bfa_fcpim.c:2497:4: warning: right shift count >= width of type [enabled by default]
      drivers/scsi/bfa/bfa_fcpim.c:2509:4: warning: left shift count >= width of type [enabled by default]
          addr = bfa_sgaddr_le(sg_dma_address(sg));
          ^
      drivers/scsi/bfa/bfa_fcpim.c:2509:4: warning: right shift count >= width of type [enabled by default]
      
      Avoid this by adding casts to u64 in bfa_swap_words().
      
      Compile-tested only.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Acked-by: default avatarAnil Gurumurthy <anil.gurumurthy@qlogic.com>
      Fixes: f16a1750 ('[SCSI] bfa: remove all OS wrappers')
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db065663
    • Daniel Mack's avatar
      ASoC: pxa-ssp: drop SNDRV_PCM_FMTBIT_S24_LE · f4d475ab
      Daniel Mack authored
      commit 9301503af016eb537ccce76adec0c1bb5c84871e upstream.
      
      This mode is unsupported, as the DMA controller can't do zero-padding
      of samples.
      Signed-off-by: default avatarDaniel Mack <zonque@gmail.com>
      Reported-by: default avatarJohannes Stezenbach <js@sig21.net>
      Signed-off-by: default avatarMark Brown <broonie@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4d475ab
    • Jarkko Nikula's avatar
      ASoC: max98090: Fix missing free_irq · f084c942
      Jarkko Nikula authored
      commit 4adeb0ccf86a5af1825bbfe290dee9e60a5ab870 upstream.
      
      max98090.c doesn't free the threaded interrupt it requests. This causes
      an oops when doing "cat /proc/interrupts" after snd-soc-max98090.ko is
      unloaded.
      
      Fix this by requesting the interrupt by using devm_request_threaded_irq().
      Signed-off-by: default avatarJarkko Nikula <jarkko.nikula@linux.intel.com>
      Signed-off-by: default avatarMark Brown <broonie@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f084c942
    • Sylwester Nawrocki's avatar
      ASoC: samsung: Correct I2S DAI suspend/resume ops · ef49cea3
      Sylwester Nawrocki authored
      commit d3d4e5247b013008a39e4d5f69ce4c60ed57f997 upstream.
      
      We should save/restore relevant I2S registers regardless of
      the dai->active flag, otherwise some settings are being lost
      after system suspend/resume cycle. E.g. I2S slave mode set only
      during dai initialization is not preserved and the device ends
      up in master mode after system resume.
      Signed-off-by: default avatarSylwester Nawrocki <s.nawrocki@samsung.com>
      Signed-off-by: default avatarMark Brown <broonie@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef49cea3
    • Praveen Diwakar's avatar
      ASoC: wm_adsp: Add missing MODULE_LICENSE · f3327152
      Praveen Diwakar authored
      commit 0a37c6efec4a2fdc2563c5a8faa472b814deee80 upstream.
      
      Since MODULE_LICENSE is missing the module load fails,
      so add this for module.
      Signed-off-by: default avatarPraveen Diwakar <praveen.diwakar@intel.com>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Reviewed-by: default avatarCharles Keepax <ckeepax@opensource.wolfsonmicro.com>
      Signed-off-by: default avatarMark Brown <broonie@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3327152
    • Qiao Zhou's avatar
      ASoC: pcm: fix dpcm_path_put in dpcm runtime update · 24be9aa6
      Qiao Zhou authored
      commit 7ed9de76ff342cbd717a9cf897044b99272cb8f8 upstream.
      
      we need to release dapm widget list after dpcm_path_get in
      soc_dpcm_runtime_update. otherwise, there will be potential memory
      leak. add dpcm_path_put to fix it.
      Signed-off-by: default avatarQiao Zhou <zhouqiao@marvell.com>
      Signed-off-by: default avatarMark Brown <broonie@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24be9aa6
    • Jonas Bonn's avatar
      openrisc: Rework signal handling · af7b15c9
      Jonas Bonn authored
      commit 10f67dbf6add97751050f294d4c8e0cc1e5c2c23 upstream.
      
      The mainline signal handling code for OpenRISC has been buggy since day
      one with respect to syscall restart.  This patch significantly reworks
      the signal handling code:
      
      i)   Move the "work pending" loop to C code (borrowed from ARM arch)
      
      ii)  Allow a tracer to muck about with the IP and skip syscall restart
           in that case (again, borrowed from ARM)
      
      iii) Make signal handling WRT syscall restart actually work
      
      v)   Make the signal handling code look more like that of other
           architectures so that it's easier for others to follow
      Reported-by: default avatarAnders Nystrom <anders@southpole.se>
      Signed-off-by: default avatarJonas Bonn <jonas@southpole.se>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      af7b15c9
    • Ralf Baechle's avatar
      MIPS: Fix accessing to per-cpu data when flushing the cache · 4f91cb53
      Ralf Baechle authored
      commit ff522058bd717506b2fa066fa564657f2b86477e upstream.
      
      This fixes the following issue
      
      BUG: using smp_processor_id() in preemptible [00000000] code: kjournald/1761
      caller is blast_dcache32+0x30/0x254
      Call Trace:
      [<8047f02c>] dump_stack+0x8/0x34
      [<802e7e40>] debug_smp_processor_id+0xe0/0xf0
      [<80114d94>] blast_dcache32+0x30/0x254
      [<80118484>] r4k_dma_cache_wback_inv+0x200/0x288
      [<80110ff0>] mips_dma_map_sg+0x108/0x180
      [<80355098>] ide_dma_prepare+0xf0/0x1b8
      [<8034eaa4>] do_rw_taskfile+0x1e8/0x33c
      [<8035951c>] ide_do_rw_disk+0x298/0x3e4
      [<8034a3c4>] do_ide_request+0x2e0/0x704
      [<802bb0dc>] __blk_run_queue+0x44/0x64
      [<802be000>] queue_unplugged.isra.36+0x1c/0x54
      [<802beb94>] blk_flush_plug_list+0x18c/0x24c
      [<802bec6c>] blk_finish_plug+0x18/0x48
      [<8026554c>] journal_commit_transaction+0x3b8/0x151c
      [<80269648>] kjournald+0xec/0x238
      [<8014ac00>] kthread+0xb8/0xc0
      [<8010268c>] ret_from_kernel_thread+0x14/0x1c
      
      Caches in most systems are identical - but not always, so we can't avoid
      the use of smp_call_function() by just looking at the boot CPU's data,
      have to fiddle with preemption instead.
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/5835
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f91cb53
    • Aaro Koskinen's avatar
      MIPS: OCTEON: make get_system_type() thread-safe · d2a3ec39
      Aaro Koskinen authored
      commit 608308682addfdc7b8e2aee88f0e028331d88e4d upstream.
      
      get_system_type() is not thread-safe on OCTEON. It uses static data,
      also more dangerous issue is that it's calling cvmx_fuse_read_byte()
      every time without any synchronization. Currently it's possible to get
      processes stuck looping forever in kernel simply by launching multiple
      readers of /proc/cpuinfo:
      
      	(while true; do cat /proc/cpuinfo > /dev/null; done) &
      	(while true; do cat /proc/cpuinfo > /dev/null; done) &
      	...
      
      Fix by initializing the system type string only once during the early
      boot.
      Signed-off-by: default avatarAaro Koskinen <aaro.koskinen@nsn.com>
      Reviewed-by: default avatarMarkos Chandras <markos.chandras@imgtec.com>
      Patchwork: http://patchwork.linux-mips.org/patch/7437/Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2a3ec39
    • Markos Chandras's avatar
      MIPS: asm: thread_info: Add _TIF_SECCOMP flag · 4fc5ea5e
      Markos Chandras authored
      commit 137f7df8cead00688524c82360930845396b8a21 upstream.
      
      Add _TIF_SECCOMP flag to _TIF_WORK_SYSCALL_ENTRY to indicate
      that the system call needs to be checked against a seccomp filter.
      Signed-off-by: default avatarMarkos Chandras <markos.chandras@imgtec.com>
      Reviewed-by: default avatarPaul Burton <paul.burton@imgtec.com>
      Reviewed-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/6405/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      [bwh: Backported to 3.2: various other flags are not included in
       _TIF_WORK_SYSCALL_ENTRY]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4fc5ea5e
    • Ralf Baechle's avatar
      MIPS: Cleanup flags in syscall flags handlers. · 1b91a02f
      Ralf Baechle authored
      commit e7f3b48af7be9f8007a224663a5b91340626fed5 upstream.
      
      This will simplify further modifications.
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b91a02f
    • Alex Smith's avatar
      MIPS: asm/reg.h: Make 32- and 64-bit definitions available at the same time · 887c1489
      Alex Smith authored
      commit bcec7c8da6b092b1ff3327fd83c2193adb12f684 upstream.
      
      Get rid of the WANT_COMPAT_REG_H test and instead define both the 32-
      and 64-bit register offset definitions at the same time with
      MIPS{32,64}_ prefixes, then define the existing EF_* names to the
      correct definitions for the kernel's bitness.
      
      This patch is a prerequisite of the following bug fix patch.
      Signed-off-by: default avatarAlex Smith <alex@alex-smith.me.uk>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/7451/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      887c1489
    • Huacai Chen's avatar
      MIPS: Remove BUG_ON(!is_fpu_owner()) in do_ade() · f020cedd
      Huacai Chen authored
      commit 2e5767a27337812f6850b3fa362419e2f085e5c3 upstream.
      
      In do_ade(), is_fpu_owner() isn't preempt-safe. For example, when an
      unaligned ldc1 is executed, do_cpu() is called and then FPU will be
      enabled (and TIF_USEDFPU will be set for the current process). Then,
      do_ade() is called because the access is unaligned.  If the current
      process is preempted at this time, TIF_USEDFPU will be cleard.  So when
      the process is scheduled again, BUG_ON(!is_fpu_owner()) is triggered.
      
      This small program can trigger this BUG in a preemptible kernel:
      
      int main (int argc, char *argv[])
      {
              double u64[2];
      
              while (1) {
                      asm volatile (
                              ".set push \n\t"
                              ".set noreorder \n\t"
                              "ldc1 $f3, 4(%0) \n\t"
                              ".set pop \n\t"
                              ::"r"(u64):
                      );
              }
      
              return 0;
      }
      
      V2: Remove the BUG_ON() unconditionally due to Paul's suggestion.
      Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
      Signed-off-by: default avatarJie Chen <chenj@lemote.com>
      Signed-off-by: default avatarRui Wang <wangr@lemote.com>
      Cc: John Crispin <john@phrozen.org>
      Cc: Steven J. Hill <Steven.Hill@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: Fuxin Zhang <zhangfx@lemote.com>
      Cc: Zhangjin Wu <wuzhangjin@gmail.com>
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f020cedd
    • Huacai Chen's avatar
      MIPS: tlbex: Fix a missing statement for HUGETLB · 33103cff
      Huacai Chen authored
      commit 8393c524a25609a30129e4a8975cf3b91f6c16a5 upstream.
      
      In commit 2c8c53e2 (MIPS: Optimize TLB handlers for Octeon CPUs)
      build_r4000_tlb_refill_handler() is modified. But it doesn't compatible
      with the original code in HUGETLB case. Because there is a copy & paste
      error and one line of code is missing. It is very easy to produce a bug
      with LTP's hugemmap05 test.
      Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
      Signed-off-by: default avatarBinbin Zhou <zhoubb@lemote.com>
      Cc: John Crispin <john@phrozen.org>
      Cc: Steven J. Hill <Steven.Hill@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: Fuxin Zhang <zhangfx@lemote.com>
      Cc: Zhangjin Wu <wuzhangjin@gmail.com>
      Patchwork: https://patchwork.linux-mips.org/patch/7496/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33103cff
    • Paul Burton's avatar
      MIPS: Prevent user from setting FCSR cause bits · fbd9df2e
      Paul Burton authored
      commit b1442d39fac2fcfbe6a4814979020e993ca59c9e upstream.
      
      If one or more matching FCSR cause & enable bits are set in saved thread
      context then when that context is restored the kernel will take an FP
      exception. This is of course undesirable and considered an oops, leading
      to the kernel writing a backtrace to the console and potentially
      rebooting depending upon the configuration. Thus the kernel avoids this
      situation by clearing the cause bits of the FCSR register when handling
      FP exceptions and after emulating FP instructions.
      
      However the kernel does not prevent userland from setting arbitrary FCSR
      cause & enable bits via ptrace, using either the PTRACE_POKEUSR or
      PTRACE_SETFPREGS requests. This means userland can trivially cause the
      kernel to oops on any system with an FPU. Prevent this from happening
      by clearing the cause bits when writing to the saved FCSR context via
      ptrace.
      
      This problem appears to exist at least back to the beginning of the git
      era in the PTRACE_POKEUSR case.
      Signed-off-by: default avatarPaul Burton <paul.burton@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: stable@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/7438/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fbd9df2e
    • Jeffrey Deans's avatar
      MIPS: GIC: Prevent array overrun · f52337e5
      Jeffrey Deans authored
      commit ffc8415afab20bd97754efae6aad1f67b531132b upstream.
      
      A GIC interrupt which is declared as having a GIC_MAP_TO_NMI_MSK
      mapping causes the cpu parameter to gic_setup_intr() to be increased
      to 32, causing memory corruption when pcpu_masks[] is written to again
      later in the function.
      Signed-off-by: default avatarJeffrey Deans <jeffrey.deans@imgtec.com>
      Signed-off-by: default avatarMarkos Chandras <markos.chandras@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/7375/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f52337e5
    • K. Y. Srinivasan's avatar
      drivers: scsi: storvsc: Correctly handle TEST_UNIT_READY failure · 8ce6d81a
      K. Y. Srinivasan authored
      commit 3533f8603d28b77c62d75ec899449a99bc6b77a1 upstream.
      
      On some Windows hosts on FC SANs, TEST_UNIT_READY can return SRB_STATUS_ERROR.
      Correctly handle this. Note that there is sufficient sense information to
      support scsi error handling even in this case.
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8ce6d81a
    • K. Y. Srinivasan's avatar
      Drivers: scsi: storvsc: Implement a eh_timed_out handler · 7afc3ac1
      K. Y. Srinivasan authored
      commit 56b26e69c8283121febedd12b3cc193384af46b9 upstream.
      
      On Azure, we have seen instances of unbounded I/O latencies. To deal with
      this issue, implement handler that can reset the timeout. Note that the
      host gaurantees that it will respond to each command that has been issued.
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      [hch: added a better comment explaining the issue]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7afc3ac1