Release notes for Gluster 4.1.0

This is a major release that includes a range of features enhancing management, performance, monitoring, and providing newer functionality like thin arbiters, cloud archival, time consistency. It also contains several bug fixes.

A selection of the important features and changes are documented on this page. A full list of bugs that have been addressed is included further below.

Announcements

  1. As 4.0 was a short term maintenance release, features which have been included in that release are available with 4.1.0 as well. These features may be of interest to users upgrading to 4.1.0 from older than 4.0 releases. The 4.0 release notes captures the list of features that were introduced with 4.0.

NOTE: As 4.0 was a short term maintenance release, it will reach end of life (EOL) with the release of 4.1.0. (reference)

  1. Releases that receive maintenance updates post 4.1 release are, 3.12, and 4.1 (reference)

NOTE: 3.10 long term maintenance release, will reach end of life (EOL) with the release of 4.1.0. (reference)

  1. Continuing with this release, the CentOS storage SIG will not build server packages for CentOS6. Server packages will be available for CentOS7 only. For ease of migrations, client packages on CentOS6 will be published and maintained.

NOTE: This change was announced here

Major changes and features

Features are categorized into the following sections,

Management

GlusterD2

IMP: GlusterD2 in Gluster-4.1.0 is still considered a preview and is experimental. It should not be considered for production use. Users should still expect breaking changes to be possible, though efforts will be taken to avoid such changes. As GD2 is still under heavy development, new features can be expected throughout the 4.1 release.

GD2 brings initial support for rebalance, snapshots, intelligent volume provisioning and a lot of other bug fixes and internal changes.

Rebalance #786

GD2 supports running rebalance on volumes. Supported rebalance operations include, - rebalance start - rebalance start with fix-layout - rebalance stop - rebalance status

Support only exists in the ReST API right now. CLI support will be introduced in subsequent releases.

Snapshot #533

Initial support for volume snapshot has been introduced. At the moment, snapshots are supported only on Thin-LVM bricks.

Support snapshot operations include, - create - activate/deactivate - list - info

Intelligent volume provisioning (IVP) #661

GD2 brings very early preview for intelligent volume creation, similar to Heketi.

IMP: This is considered experimental, and the API and implementation is not final. It is very possible that both the API and the implementation will change.

IVP enables users to create volumes by just providing the expected volume type and a size, without providing the bricks layout. IVP is supported in CLI in the normal volume create command.

More information on IVP can be found in the pull-request.

To support IVP, support for adding and managing block devices, and basic support for zones is available. #783 #785

Other changes

Other notable changes include, - Support for volume option levels (experimental, advanced, deprecated) #591 - Support for resetting volume options #545 - Option hooks for volume set #708 - Support for setting quota options #583 - Changes to transaction locking #808 - Support for setting metadata on peers and volume #600 #689 #704 - Thin arbiter support #673 #702

In addition to the above, a lot of smaller bug-fixes and enhancements to internal frameworks and tests have also been done.

Known issues

GD2 is still under heavy development and has lots of known bugs. For filing new bugs or tracking known bugs, please use the GD2 github issue tracker.

2. Changes to gluster based smb.conf share management

Previously Gluster used to delete the entire volume share section from smb.conf either after volume is stopped or while disabling user.cifs/user.smb volume set options. With this release those volume share sections, that were added by samba hook scripts inside smb.conf, will not get removed post a volume stop or on disabling user.cifs/user.smb volume set options. Instead we add the following share specific smb.conf parameter to the end of corresponding volume share section to make it unavailable for client access:

available = no

This will make sure that the additional smb.conf parameters configured externally are retained. For more details on the above parameter search under "available (S)" at smb.conf(5) manual page.

Monitoring

Various xlators are enhanced to provide additional metrics, that help in determining the effectiveness of the xlator in various workloads.

These metrics can be dumped and visualized as detailed here.

1. Additional metrics added to negative lookup cache xlator

Metrics added are: - negative_lookup_hit_count - negative_lookup_miss_count - get_real_filename_hit_count - get_real_filename_miss_count - nameless_lookup_count - inodes_with_positive_dentry_cache - inodes_with_negative_dentry_cache - dentry_invalidations_recieved - cache_limit - consumed_cache_size - inode_limit - consumed_inodes

2. Additional metrics added to md-cache xlator

Metrics added are: - stat_cache_hit_count - stat_cache_miss_count - xattr_cache_hit_count - xattr_cache_miss_count - nameless_lookup_count - negative_lookup_count - stat_cache_invalidations_received - xattr_cache_invalidations_received

3. Additional metrics added to quick-read xlator

Metrics added are: - total_files_cached - total_cache_used - cache-hit - cache-miss - cache-invalidations

Performance

1. Support for fuse writeback cache

Gluster FUSE mounts support FUSE extension to leverage the kernel "writeback cache".

For usage help see man 8 glusterfs and man 8 mount.glusterfs, specifically the options -kernel-writeback-cache and -attr-times-granularity.

2. Extended eager-lock to metadata transactions in replicate xlator

Eager lock feature in replicate xlator is extended to support metadata transactions in addition to data transactions. This helps in improving the performance when there are frequent metadata updates in the workload. This is typically seen with sharded volumes by default, and in other workloads that incur a higher rate of metadata modifications to the same set of files.

As a part of this feature, compounded FOPs feature in AFR is deprecated, volumes that are configured to leverage compounding will start disregarding the option use-compound-fops.

NOTE: This is an internal change in AFR xlator and is not user controlled or configurable.

3. Support for multi-threaded fuse readers

FUSE based mounts can specify number of FUSE request processing threads during a mount. For workloads that have high concurrency on a single client, this helps in processing FUSE requests in parallel, than the existing single reader model.

This is provided as a mount time option named reader-thread-count and can be used as follows,

# mount -t glusterfs -o reader-thread-count=<n> <server>:<volname> <mntpoint>

4. Configurable aggregate size for write-behind xlator

Write-behind xlator provides the option performance.aggregate-size to enable configurable aggregate write sizes. This option enables write-behind xlator to aggregate writes till the specified value before the writes are sent to the bricks.

Existing behaviour set this size to a maximum of 128KB per file. The configurable option provides the ability to tune this up or down based on the workload to improve performance of writes.

Usage:

# gluster volume set <volname> performance.aggregate-size <size>

5. Adaptive read replica selection based on queue length

AFR xlator is enhanced with a newer value for the option read-hash-mode. Providing this option with a value of 3 will distribute reads across AFR subvolumes based on the subvol having the least outstanding read requests.

This helps in better distributing and hence improving workload performance on reads, in replicate based volumes.

Standalone

1. Thin arbiter quorum for 2-way replication

NOTE: This feature is available only with GlusterD2

Documentation for the feature is provided here.

2. Automatically configure backup volfile servers in clients

NOTE: This feature is available only with GlusterD2

Clients connecting and mounting a Gluster volume, will automatically fetch and configure backup volfile servers, for future volfile updates and fetches, when the initial server used to fetch the volfile and mount is down.

When using glusterd, this is achieved using the FUSE mount option backup-volfile-servers, and when using GlusterD2 this is done automatically.

3. Python code base is now python3 compatible

All python code in Gluster that are installed by the various packages are python3 compatible.

4. (c/m)time equivalence across replicate and disperse subvolumes

Enabling the utime feature, enables Gluster to maintain consistent change and modification time stamps on files and directories across bricks.

This feature is useful when applications are sensitive to time deltas between operations (for example tar may report "file changed as we read it"), to maintain and report equal time stamps on the file across the subvolumes.

To enable the feature use,

# gluster volume set <volname> features.utime

Limitations: - Mounting gluster volume with time attribute options (noatime, realatime...) is not supported with this feature - Certain entry operations (with differing creation flags) would reflect an eventual consistency w.r.t the time attributes - This feature does not guarantee consistent time for directories if hashed sub-volume for the directory is down - readdirp (or directory listing) is not supported with this feature

1. New API for acquiring leases and acting on lease recalls

A new API to acquire a lease on an open file and also to receive callbacks when the lease is recalled, is provided with gfapi.

Refer to the header for details on how to use this API.

2. Extended language bindings for gfapi to include perl

See, libgfapi-perl - Libgfapi bindings for Perl using FFI

Major issues

None

Bugs addressed

Bugs addressed since release-4.0.0 are listed below.

  • #1074947: add option to build rpm without server
  • #1234873: glusterfs-resource-agents - volume - voldir is not properly set
  • #1272030: Remove lock recovery logic from client and server protocol translators
  • #1304962: Intermittent file creation fail,while doing concurrent writes on distributed volume has more than 40 bricks
  • #1312830: tests fail because bug-924726.t depends on netstat
  • #1319992: RFE: Lease support for gluster
  • #1450546: Paths to some tools are hardcoded to /sbin or /usr/sbin
  • #1450593: Gluster Python scripts do not check return value of find_library
  • #1468483: Sharding sends all application sent fsyncs to the main shard file
  • #1495153: xlator_t structure's 'client_latency' variable is not used
  • #1500649: Shellcheck errors in hook scripts
  • #1505355: quota: directories doesn't get heal on newly added bricks when quota is full on sub-directory
  • #1506140: Add quorum checks in post-op
  • #1507230: Man pages badly formatted
  • #1512691: PostgreSQL DB Restore: unexpected data beyond EOF
  • #1517260: Volume wrong size
  • #1521030: rpc: unregister programs before registering them again
  • #1523122: fix serval bugs found on testing protocol/client
  • #1523219: fuse xlator uses block size and fragment size 128KB leading to rounding off in df output
  • #1530905: Reducing regression time of glusterd test cases
  • #1533342: Syntactical errors in hook scripts for managing SELinux context on bricks
  • #1536024: Rebalance process is behaving differently for AFR and EC volume.
  • #1536186: build: glibc has removed legacy rpc headers and rpcgen in Fedora28, use libtirpc
  • #1537362: glustershd/glusterd is not using right port when connecting to glusterfsd process
  • #1537364: [RFE] - get-state option should mark profiling enabled flag at volume level
  • #1537457: DHT log messages: Found anomalies in (null) (gfid = 00000000-0000-0000-0000-000000000000). Holes=1 overlaps=0
  • #1537602: Georeplication tests intermittently fail
  • #1538258: build: python-ctypes only in RHEL <= 7
  • #1538427: Seeing timer errors in the rebalance logs
  • #1539023: Add ability to control verbosity settings while compiling
  • #1539166: [bitrot] scrub ondemand reports it's start as success without additional detail
  • #1539358: Changes to self-heal logic w.r.t. detecting of split-brains
  • #1539510: Optimize glusterd_import_friend_volume code path
  • #1539545: gsyncd is running gluster command to get config file path is not required
  • #1539603: Glusterfs crash when doing statedump with memory accounting is disabled
  • #1540338: Change op-version of master to 4.1.0 for future options that maybe added
  • #1540607: glusterd fails to attach brick during restart of the node
  • #1540669: Do lock conflict check correctly for wait-list
  • #1541038: A down brick is incorrectly considered to be online and makes the volume to be started without any brick available
  • #1541264: dht_layout_t leak in dht_populate_inode_for_dentry
  • #1541916: The used space in the volume increases when the volume is expanded
  • #1542318: dht_lookup_unlink_of_false_linkto_cbk fails with "Permission denied"
  • #1542829: Too many log messages about dictionary and options
  • #1543279: Moving multiple temporary files to the same destination concurrently causes ESTALE error
  • #1544090: possible memleak in glusterfsd process with brick multiplexing on
  • #1544600: 3.8 -> 3.10 rolling upgrade fails (same for 3.12 or 3.13) on Ubuntu 14
  • #1544699: Rolling upgrade to 4.0 is broken
  • #1544961: libgfrpc does not export IPv6 RPC methods even with --with-ipv6-default
  • #1545048: [brick-mux] process termination race while killing glusterfsd on last brick detach
  • #1545056: [CIOT] : Gluster CLI says "io-threads : enabled" on existing volumes post upgrade.
  • #1545891: Provide a automated way to update bugzilla status with patch merge.
  • #1546129: Geo-rep: glibc fix breaks geo-replication
  • #1546620: DHT calls dht_lookup_everywhere for 1xn volumes
  • #1546954: [Rebalance] "Migrate file failed: : failed to get xattr [No data available]" warnings in rebalance logs
  • #1547068: Bricks getting assigned to different pids depending on whether brick path is IP or hostname based
  • #1547128: Typo error in __dht_check_free_space function log message
  • #1547662: After a replace brick command, self-heal takes some time to start healing files on disperse volumes
  • #1547888: [brick-mux] incorrect event-thread scaling in server_reconfigure()
  • #1548361: Make afr_fsync a transaction
  • #1549000: line-coverage tests not capturing details properly.
  • #1549606: Eager lock should be present for both metadata and data transactions
  • #1549915: [Fuse Sub-dir] After performing add-brick on volume,doing rm -rf * on subdir mount point fails with "Transport endpoint is not connected"
  • #1550078: memory leak in pre-op in replicate volumes for every write
  • #1550339: glusterd leaks memory when vol status is issued
  • #1550895: GD2 fails to dlopen server xlator
  • #1550936: Pause/Resume of geo-replication with wrong user specified returns success
  • #1553129: Memory corruption is causing crashes, hangs and invalid answers
  • #1553598: [Rebalance] ENOSPC errors on few files in rebalance logs
  • #1553926: configure --without-ipv6-default has odd behaviour
  • #1553938: configure summary TIRPC result is misleading
  • #1554053: 4.0 clients may fail to convert iatt in dict when recieving the same from older (< 4.0) servers
  • #1554743: [EC] Read performance of EC volume exported over gNFS is significantly lower than write performance
  • #1555154: glusterd: TLS verification fails when using intermediate CA instead of self-signed certificates
  • #1555167: namespace test failure
  • #1557435: Enable lookup-optimize by default
  • #1557876: Fuse mount crashed with only one VM running with its image on that volume
  • #1557932: Shard replicate volumes don't use eager-lock affectively
  • #1558016: test ./tests/bugs/ec/bug-1236065.t is generating crash on build
  • #1558074: [disperse] Add tests for in-memory stripe cache for the non aligned write
  • #1558380: Modify glfsheal binary to accept socket file path as an optional argument.
  • #1559004: /var/log/glusterfs/bricks/export_vdb.log flooded with this error message "Not able to add to index [Too many links]"
  • #1559075: enable ownthread feature for glusterfs4_0_fop_prog
  • #1559126: Incorrect error message in /features/changelog/lib/src/gf-history-changelog.c
  • #1559130: ssh stderr in glusterfind gets swallowed
  • #1559235: Increase the inode table size on server when upcall enabled
  • #1560319: NFS client gets "Invalid argument" when writing file through nfs-ganesha with quota
  • #1560393: Fix regresssion failure for ./tests/basic/md-cache/bug-1418249.t
  • #1560411: fallocate created data set is crossing storage reserve space limits resulting 100% brick full
  • #1560441: volume stop in mgmt v3
  • #1560589: nl-cache.t fails
  • #1560957: After performing remove-brick followed by add-brick operation, brick went offline state
  • #1561129: When storage reserve limit is reached, appending data to an existing file throws EROFS error
  • #1561406: Rebalance failures on a dispersed volume with lookup-optimize enabled
  • #1562052: build: revert configure --without-ipv6-default behaviour
  • #1562717: SHD is not healing entries in halo replication
  • #1562907: set mgmt_v3_timer->timer to NULL after mgmt_v3_timer is deleted
  • #1563273: mark brick as online only when portmap registration is completed
  • #1563334: Honour cluster.localtime-logging option for all the daemons
  • #1563511: Redundant synchronization in rename codepath for a single subvolume DHT
  • #1563945: [EC] Turn ON the stripe-cache option by default for ec volume
  • #1564198: [Remove-brick] Many files were not migrated from the decommissioned bricks; commit results in data loss
  • #1564235: gfapi: fix a couple of minor issues
  • #1564600: Client can create denial of service (DOS) conditions on server
  • #1566067: Volume status inode is broken with brickmux
  • #1566207: Linux kernel untar failed with "xz: (stdin): Read error: Invalid argument" immediate after add-brick
  • #1566303: Removing directories from multiple clients throws ESTALE errors
  • #1566386: Disable choose-local in groups virt and gluster-block
  • #1566732: EIO errors on some operations when volume has mixed brick versions on a disperse volume
  • #1567209: Geo-rep: faulty session due to OSError: [Errno 95] Operation not supported
  • #1567880: Grant Deepshikha access to all CI-related infrastructure
  • #1567881: Halo replication I/O path is not working
  • #1568348: Rebalance on few nodes doesn't seem to complete - stuck at FUTEX_WAIT
  • #1568521: shard files present even after deleting vm from ovirt UI
  • #1568820: Add generated HMAC token in header for webhook calls
  • #1568844: [snapshot-scheduler]Prevent access of shared storage volume from the outside client
  • #1569198: bitrot scrub status does not show the brick where the object (file) is corrupted
  • #1569489: Need heal-timeout to be configured as low as 5 seconds
  • #1570011: test case is failing ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t while brick mux is enabled
  • #1570538: linux untar errors out at completion during disperse volume inservice upgrade
  • #1570962: print the path of the corrupted object in scrub status
  • #1571069: [geo-rep]: Lot of changelogs retries and "dict is null" errors in geo-rep logs
  • #1572076: Dictionary response is not captured in syncop_(f)xattrop
  • #1572581: Remove-brick failed on Distributed volume while rm -rf is in-progress
  • #1572586: dht: do not allow migration if file is open
  • #1573066: growing glusterd memory usage with connected RHGSWA
  • #1573119: Amends in volume profile option 'gluster-block'
  • #1573220: Memory leak in volume tier status command
  • #1574259: Errors unintentionally reported for snapshot status
  • #1574305: rm command hangs in fuse_request_send
  • #1574606: the regression test "tests/bugs/posix/bug-990028.t" fails
  • #1575294: lease recall callback should be avoided on closed
  • #1575386: GlusterFS 4.1.0 tracker
  • #1575707: Gluster volume smb share options are getting overwritten after restating the gluster volume
  • #1576814: GlusterFS can be improved
  • #1577162: gfapi: broken symbol versions
  • #1579674: Remove EIO from the dht_inode_missing macro
  • #1579736: Additional log messages in dht_readdir(p)_cbk
  • #1579757: DHT Log flooding in mount log "key=trusted.glusterfs.dht.mds [Invalid argument]"
  • #1580215: [geo-rep]: Lot of changelogs retries and "dict is null" errors in geo-rep logs
  • #1580540: make getfattr return proper response for "glusterfs.gfidtopath" xattr for files created when gfid2path was off
  • #1581548: writes succeed when only good brick is down in 1x3 volume
  • #1581745: bug-1309462.t is failing reliably due to changes in security.capability changes in the kernel
  • #1582056: Input/Output errors on a disperse volume with concurrent reads and writes
  • #1582063: rpc: The gluster auth version is always AUTH_GLUSTERFS_v2
  • #1582068: ctime: Rename and unlink does not update ctime
  • #1582072: posix/ctime: Access time is not updated for file with a hardlink
  • #1582080: posix/ctime: The first lookup on file is not healing the gfid
  • #1582199: posix unwinds readdirp calls with readdir signature
  • #1582286: Brick-mux regressions failing on 4.1 branch
  • #1582531: posix/ctime: Mtime is not updated on setting it to older date
  • #1582549: api: missing __THROW on pub function decls
  • #1583016: libgfapi: glfs init fails on afr volume with ctime feature enabled
  • #1583734: rpc_transport_unref() called for an unregistered socket fd
  • #1583769: Fix incorrect rebalance log message
  • #1584633: Brick process crashed after upgrade from RHGS-3.3.1 async(7.4) to RHGS-3.4(7.5)
  • #1585894: posix/ctime: EC self heal of directory is blocked with ctime feature enabled
  • #1587908: Fix deadlock in failure codepath of shard fsync
  • #1590128: xdata is leaking in server3_3_seek