You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: bp.html
+12-14
Original file line number
Diff line number
Diff line change
@@ -650,7 +650,7 @@ <h2>Introduction</h2>
650
650
651
651
<p>Depending on circumstances, sensitive information about individuals might include full name, home address, email address, national identification number, IP address, vehicle registration plate number, driver's license number, face, fingerprints, or handwriting, credit card numbers, digital identity, date of birth, birthplace, genetic information, telephone number, login name, screen name, nickname, health records etc. Although it is likely to be safe to share some of that information openly, and even more within a controlled environment, publishers should bear in mind that combining data from multiple sources may allow inadvertent identification of individuals.</p>
652
652
653
-
<p>A general Best Practice for publishing Data on the Web is to use standards. Different types of organizations specify standards that are specific to the publishing of datasets related to particular domains & applications, involving communities of users interested in that data. These standards define a common way of communicating information among the users of these communities. For example, there are two standards that can be used to publish transport timetables: the General Transit Feed Specification [[GTFS]] and the Service Interface for Real Time Information [[SIRI]]. These specify, in a mixed way, standardized terms, standardized data formats and standardized data access. Another general Best Practice is to use Unicode for handling character and string data. Unicode improves multilingual text processing and makes easier software localization easier. The Best Practices set out in this document serve a general purpose of publishing and using Data on the Web and are domain & application independent. They can be extended or complemented by other Best Practices documents or standards that cover more specialized contexts.</p>
653
+
<p>A general Best Practice for publishing Data on the Web is to use standards. Different types of organizations specify standards that are specific to the publishing of datasets related to particular domains & applications, involving communities of users interested in that data. These standards define a common way of communicating information among the users of these communities. For example, there are two standards that can be used to publish transport timetables: the General Transit Feed Specification [[GTFS]] and the Service Interface for Real Time Information [[SIRI]]. These specify, in a mixed way, standardized terms, standardized data formats and standardized data access. Another general Best Practice is to use Unicode for handling character and string data. Unicode improves multilingual text processing and makes software localization easier. The Best Practices set out in this document serve a general purpose of publishing and using Data on the Web and are domain & application independent. They can be extended or complemented by other Best Practices documents or standards that cover more specialized contexts.</p>
654
654
<!-- <p>Taking that into account, this document sets out a series of Best Practices that will help publishers and consumers face the new challenges and opportunities posed by data on the Web. They intend to serve a general purpose of publishing and using Data on the Web, but they may be specialized according to specific domains, such as Spatial Data on the Web Best Practices [[SDW-BP]].</p>-->
655
655
<p>Best Practices cover different aspects related to data publishing and
656
656
consumption, like data formats, data access, data identifiers and
<p>Humans will be able to understand data license information describing possible restrictions placed on the use of a given distribution and software agents to automatically detect the data license of a distribution.</p>
1216
+
<p>Humans will be able to understand data license information describing possible restrictions placed on the use of a given distribution, and software agents will be able to automatically detect the data license of a distribution.</p>
1217
1217
</section>
1218
1218
<sectionclass="how">
1219
1219
<h4class="subhead">Possible Approach to Implementation</h4>
becomes useful when it has been processed and transformed into
2107
2107
information. Note that there is an important distinction between formats that can be read and edited by humans using a computer and formats that are machine-readable. The latter term implies that the data is readily extracted, transformed and processed by a computer. </p>
2108
2108
<p>Using non-standard data formats is costly and inefficient, and
2109
-
the data may lose meaning as it is transformed. On the other hand,
2109
+
the data may lose meaning as it is transformed. By contrast,
2110
2110
standardized data formats enable interoperability as well as
2111
2111
future uses, such as remixing or visualization, many of which
2112
2112
cannot be anticipated when the data is first published. It is also important to note that most machine-readable standardized formats are also locale-neutral.</p>
<h4class="subhead">Possible Approach to Implementation</h4>
2120
-
<p>Make data available in a machine-readable standardized data format that is easily parseable including but not limited to CSV, XML, HDF5, JSON and RDF serialization syntaxes like RDF/XML, JSON-LD, Turtle.</p>
2120
+
<p>Make data available in a machine-readable standardized data format that is easily parseable including but not limited to CSV, XML, HDF5, JSON and RDF serialization syntaxes like RDF/XML, JSON-LD, or Turtle.</p>
2121
2121
<asideclass="example">
2122
2122
2123
2123
<p>John knows that tabular data is commonly used on the Web and he decides to use CSV as the data format for one of the distributions of the bus stops dataset. To facilitate data processing, he uses the <ahref = https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/> Model for Tabular Data and Metadata on the Web</a> for publishing the CSV distribution (<code>stops-2015-05-05.csv</code>). The example below presents a fragment of the CSV distribution which complies with the structural metadata defined in <ahref="#StructuralMetadata">Example 4</a>.</p>
@@ -2360,7 +2360,7 @@ <h4 class="subhead">Possible Approach to Implementation</h4>
2360
2360
provide lists of codes, terminologies and Linked Data vocabularies that can be used by everyone.
2361
2361
A key point is to make sure the dataset, or its documentation, provides enough (human- and machine-readable) context
2362
2362
so that data consumers can retrieve and exploit the standardized meaning of the values. In the context of the Web, using unambiguous, Web-based identifiers (URIs) for standardized vocabulary resources
2363
-
is an efficient way to do this, noting that the same URI may have multilingual labels attached for greater cross-border interoperability. The European Union's multilingual thesaurus, <ahref="http://eurovoc.europa.eu/">Eurovoc</a> provides a prime example.</p>
2363
+
is an efficient way to do this, noting that the same URI may have multilingual labels attached for greater cross-border interoperability. The European Union's multilingual thesaurus, <ahref="http://eurovoc.europa.eu/">Eurovoc</a>, provides a prime example.</p>
2364
2364
<asideclass="example"><ol>
2365
2365
<li>The DCAT vocabulary expresses metadata concerning datasets [[VOCAB-DCAT]] and
2366
2366
re-uses elements from several pre-existing vocabularies: Dublin Core, FOAF, SKOS and vCard.
human-readable and machine-readable data, using RDFa for example.
2623
2623
However, as the Architecture of the Web [[WEBARCH]] and DCAT [[VOCAB-DCAT]] make clear,
2624
2624
a resource, such as a dataset, can have many representations. The same data might be available
2625
-
as JSON, XML, RDF, CSV and HTML. These multiple representations can be made available via and API but should be made available
2626
-
from <em>the same</em> URL using <ahref="/DesignIssues/Conneg">content negotiation</a> to return the appropriate representation (what
2627
-
DCAT calls a distribution). Specific URIs can be used to identify individual representations of the data directly, by-passing
2628
-
content negotiation.</p>
2625
+
as JSON, XML, RDF, CSV and HTML. These multiple representations can be made available via an <abbrtitle="Application Programming Interface">API</abbr>, but should be made available from <em>the same</em> URL using <ahref="/DesignIssues/Conneg">content negotiation</a> to return the appropriate representation (what DCAT calls a distribution). Specific URIs can be used to identify individual representations of the data directly, by-passing content negotiation.</p>
2629
2626
</section>
2630
2627
<sectionclass="outcome">
2631
2628
<h4class="subhead">Intended Outcome</h4>
@@ -3149,7 +3146,7 @@ <h4 class="subhead">Possible Approach to Implementation</h4>
<p>A collection of international preferences, generally related to a language and geographic region that a (certain category) of users require. These are usually identified by a shorthand identifier or token, such as a language tag, that is passed from the environment to various processes to get culturally affected behavior</p><p>From <ahref="https://www.w3.org/TR/ltli/#locale">Language Tags and Locale Identifiers for the World Wide Web</a> [[LTLI]].</p>
3736
+
<p>A collection of international preferences, generally related to a language and geographic region that a (certain category) of users require. These are usually identified by a shorthand identifier or token, such as a language tag, that is passed from the environment to various processes to get culturally affected behavior.</p>
3737
+
<p>From <ahref="https://www.w3.org/TR/ltli/#locale">Language Tags and Locale Identifiers for the World Wide Web</a> [[LTLI]].</p>
0 commit comments