package WARC::Fields;						# -*- CPerl -*-

use strict;
use warnings;

our @ISA = qw();

require WARC; *WARC::Fields::VERSION = \$WARC::VERSION;

use overload '@{}' => \&_as_tied_array, '%{}' => \&_as_tied_hash;

=head1 NAME

WARC::Fields - WARC record headers and application/warc-fields

=head1 SYNOPSIS

  require WARC::Fields;

  $f = new WARC::Fields;
  $f = $record->fields;			# get WARC record headers
  $g = $f->clone;			# make writable copy

  $g->set_readonly;			# make read-only

  $f->field('WARC-Type' => 'metadata');	# set
  $value = $f->field('WARC-Type');	# get
  $f->remove_field('WARC-Type');	# delete

  $fields_text = $f->as_string;		# get WARC header lines

  tie @field_names, ref $f, $f;		# bind ordered list of field names

  tie %fields, ref $f, $f;		# bind hash of field names => values

  $row = $f->[$num];			# tie an anonymous array and access it
  $value = $f->{$name};			# likewise with an anonymous tied hash

  $name = "$row";			# tied array returns objects
  $value = $row->value;			# one specific value
  $offset = $row->offset;		# N of M with same name

  foreach (keys %{$f}) { ... }		# iterate over names, in order

=head1 DESCRIPTION

The C<WARC::Fields> class encapsulates information in the
"application/warc-fields" format used for WARC record headers.  This is a
simple key-value format closely analogous to HTTP headers, however
differences are significant enough that the C<HTTP::Headers> class cannot
be reliably reused for WARC fields.

Instances of this class are usually created as member variables of the
C<WARC::Record> class, but can also be returned as the content of WARC
records with Content-Type "application/warc-fields".

Instances of C<WARC::Fields> retrieved from WARC files are read-only and
will croak() if any attempt is made to change their contents.

This class strives to faithfully represent the contents of a WARC file,
although the field names are defined to be case-insensitive.

Most WARC headers may only appear once and with a single value in valid
WARC records, with the notable exception of the WARC-Concurrent-To header.
C<WARC::Fields> neither attempts to enforce nor relies upon this
constraint.  Headers that appear multiple times are considered to have
multiple values, that is, the value associated with the header name will be
an array reference.  Similarly, the name of a recurring header is
B<repeated> in the tied array interface.  When iterating a tied hash, all
values of a recurring header are collected and returned with the B<first>
occurrence of its key.

As with C<HTTP::Headers>, the '_' character is converted to '-' in field
names unless the first character of the name is ':', which cannot itself
appear in a field name.  Unlike C<HTTP::Headers>, the leading ':' is
stripped off immediately and the name stored otherwise exactly as given.
The method and tied hash interfaces allow this convenience feature.  The
field names exposed via the tied array interface are reported B<exactly> as
they appear in the WARC file.

Strictly, "X-Crazy-Header" and "X_Crazy_Header" are two B<different>
headers that the above convenience mechanism conflates.  The solution is
simple: if (and only if) a header field B<already exists> with the B<exact>
name given, it is used, otherwise y/_/-/ occurs and the name is rechecked
for another exact match.  If no match is found, case is folded and a third
check performed.  If a match is found, the existing header is updated,
otherwise a new header is created with character case as given.

The WARC standard specifically states that field names are
case-insensitive, accordingly, "X-Crazy-Header" and "X-CRAZY-HeAdEr" are
considered the same header for the method and tied hash interfaces.  They
will appear exactly as given in the tied array interface, however.

=head2 Methods

=over

=item $f = WARC::Fields-E<gt>new

Construct a new C<WARC::Fields> object.  Initial contents can be passed as
key-value pairs to this constructor and will be added in the given order.

=cut

sub new {
}

=item $f-E<gt>clone

Copy a C<WARC::Fields> object.  A copy of a read-only object is writable.

=cut

sub clone {
}

=item $f-E<gt>field( $name )

=item $f-E<gt>field( $name =E<gt> $value )

=item $f-E<gt>field( $n1 =E<gt> $v1, $n2 =E<gt> $v2, ... )

Get or set the value of one or more fields.  The field name is not case
sensitive, but C<WARC::Fields> will preserve its case if a new entry is
created.

=cut

sub field {
}

=item $f = WARC::Fields-E<gt>parse( $text )

=item $f = WARC::Fields-E<gt>parse( from =E<gt> $fh )

Construct a new C<WARC::Fields> object, reading initial contents from the
provided text string or filehandle.

If the C<parse> method encounters a field name with a leading ':', which
implies an empty name and is not allowed, the leading ':' is silently
dropped from the line and parsing retried.  If the line is not valid after
this change, the C<parse> method croaks.

=cut

sub parse {
  m/^:?([^:]+):\s*(.*)$/; # $1 -- name	$2 -- value
}

=item $f-E<gt>as_string

Return the contents as a formatted WARC header or application/warc-fields
block.

=cut

sub as_string {
}

=item $f-E<gt>set_readonly

Mark a C<WARC::Fields> object read-only.  All methods that modify the
object will croak() if called on a read-only object.

=cut

sub set_readonly {
}

=back

=head2 Tied Array Access

The order of field names can be fully controlled by tying an array to a
C<WARC::Fields> object and manipulating the array using ordinary Perl
operations.  Removing a name from the array effectively removes the field
from the object, but the value for that name is still remembered, allowing
names to be moved about without loss of data.

C<WARC::Fields> will croak() if an attempt is made to set a field name with
a leading ':' using the tied array interface.

=cut

sub TIEARRAY {
}

{
  package WARC::Fields::TiedArray::Row;

  use overload '""' => 'name';

=pod

The tied array interface accepts simple string values but returns objects
with additional information.  The returned object stringifies to the name
for that row but additionally has C<value> and C<offset> methods.

=over

=item $row = $array[$n]

=item $row = $f-E<gt>[$n]

The tied array C<FETCH> method returns a "row object" instead of the name
itself.

=cut

sub _new {
}

=item $name = "$row"

=item $name = $row-E<gt>name

=item $name = "$f-E<gt>[$n]"

=item $name = $f-E<gt>[$n]-E<gt>name

The C<name> method on a row object returns the field name.  Stringification
is overloaded to call this method.

=cut

sub name {
}

=item $value = $row-E<gt>value

=item $value = $array[$n]-E<gt>value

=item $value = $f-E<gt>[$n]-E<gt>value

The C<value> method on a row object returns the field value for this
particular row.  Only a single scalar is returned, even if multiple rows
share the same name.

=cut

sub value {
}

=item $offset = $row-E<gt>offset

=item $offset = $array[$n]-E<gt>offset

=item $offset = $f-E<gt>[$n]-E<gt>offset

The C<offset> method on a row object returns the position of this row
amongst multiple rows with the same field name.  These positions are
numbered from zero and are identical to the positions in the array
reference returned for this row's field name from the C<field> method or
the tied hash interface.

=cut

sub offset {
}

=back

=cut

}

{
  package WARC::Fields::TiedArray;

  sub FETCH {
  }

  sub STORE {
  }

  sub FETCHSIZE {
  }

  sub STORESIZE {
  }

  sub EXTEND {
  }

  sub EXISTS {
  }

  sub DELETE {
  }

  sub CLEAR {
  }

  sub PUSH {
  }

  sub POP {
  }

  sub SHIFT {
  }

  sub UNSHIFT {
  }

  sub SPLICE {
  }

  sub UNTIE {
  }

  sub DESTROY {
  }
}

=head2 Tied Hash Access

The contents of a C<WARC::Fields> object can be easily examined by tying a
hash to the object.  Reading or setting a hash key is equivalent to the
C<field> method, but the tied hash will iterate keys and values in the
order in which each key B<first> appears in the internal list.

=cut

sub TIEHASH {
}

{
  package WARC::Fields::TiedHash;

  sub FETCH {
  }

  sub STORE {
  }

  sub DELETE {
  }

  sub CLEAR {
  }

  sub EXISTS {
  }

  sub FIRSTKEY {
  }

  sub NEXTKEY {
  }

  sub SCALAR {
  }

  sub UNTIE {
  }

  sub DESTROY {
  }
}

=head2 Overloaded Dereference Operators

The C<WARC::Fields> class provides overloaded dereference operators for
array and hash dereferencing.  The overloaded operators provide an
anonymous tied array or hash as needed, allowing the object itself to be
used as a reference to its tied array and hash interfaces.  There is a
caveat to be aware of, however, so read on.

=cut

sub _as_tied_array {
}

sub _as_tied_hash {
}

=head3 Reference Count Trickery with Overloaded Dereference Operators

To avoid problems, the underlying tied object is a reference to the parent
object.  For ordinary use of C<tie>, this is a strong reference, however,
the anonymous tied array and hash are cached in the object to avoid having
to tie a new object every time the dereference operators are used.

To prevent memory leaks due to circular references, the overloaded
dereference operators tie a I<weak> reference to the parent object.  The
tied aggregate always holds a strong reference to its object, but when the
dereference operators are used, that inner object is a I<weak> reference to
the actual C<WARC::Fields> object.

The caveat is this: do not attempt to save a reference to the array or hash
produced by dereferencing a C<WARC::Fields> object.  The parent
C<WARC::Fields> object must remain in scope for as long as any anonymous
tied aggregates exist.

=cut

1;
__END__

=head1 AUTHOR

Jacob Bachmeyer, E<lt>jcb@cpan.orgE<gt>

=head1 SEE ALSO

L<WARC>, L<HTTP::Headers>, L<Scalar::Util> for C<weaken>

=head1 COPYRIGHT AND LICENSE

Copyright (C) 2019 by Jacob Bachmeyer

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.

=cut
