Fossil

Check-in [fd1c9b09]
Login

Check-in [fd1c9b09]

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Back out [5bb921dd0893a548]. It turns out that REQUEST_URI should have the query string appended. Make other changes to cgi.c to bring it into "compliance". "Compliance" is in quotes because rfc3875 does not define REQUEST_URI. That variable is really just by conveniention. But Apache and Nginx both append the query string, so we should too.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | cgi-compliance
Files: files | file ages | folders
SHA3-256: fd1c9b090ac53c372704488bbad94f4c5eda276704597169bca002f6e7873520
User & Date: drh 2022-02-13 00:26:38
Context
2022-02-13
19:14
Improved robustness in CGI variable parsing. ... (Closed-Leaf check-in: b8973500 user: drh tags: cgi-compliance)
01:35
Cherry-pick from branch 'cgi-compliance' (and thus back out [5bb921dd0893a548]). Adapt the computation of g.zUrlSuffix in set_base_url() accordingly. ... (check-in: 5c649c7e user: george tags: base-href-fix)
00:26
Back out [5bb921dd0893a548]. It turns out that REQUEST_URI should have the query string appended. Make other changes to cgi.c to bring it into "compliance". "Compliance" is in quotes because rfc3875 does not define REQUEST_URI. That variable is really just by conveniention. But Apache and Nginx both append the query string, so we should too. ... (check-in: fd1c9b09 user: drh tags: cgi-compliance)
2022-02-12
20:53
Update the defense-against-robots documentation to align with current behavior. ... (check-in: c9082b29 user: drh tags: trunk)
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to src/cgi.c.

1172
1173
1174
1175
1176
1177
1178




1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
** the QUERY_STRING environment variable (if it exists), from standard
** input if there is POST data, and from HTTP_COOKIE.
**
** REQUEST_URI, PATH_INFO, and SCRIPT_NAME are related as follows:
**
**      REQUEST_URI == SCRIPT_NAME + PATH_INFO
**




** Where "+" means concatenate.  Fossil requires SCRIPT_NAME.  If
** REQUEST_URI is provided but PATH_INFO is not, then PATH_INFO is
** computed from REQUEST_URI and SCRIPT_NAME.  If PATH_INFO is provided
** but REQUEST_URI is not, then compute REQUEST_URI from PATH_INFO and
** SCRIPT_NAME.  If neither REQUEST_URI nor PATH_INFO are provided, then
** assume that PATH_INFO is an empty string and set REQUEST_URI equal
** to PATH_INFO.
**
** Sometimes PATH_INFO is missing and SCRIPT_NAME is not a prefix of
** REQUEST_URI.  (See https://fossil-scm.org/forum/forumpost/049e8650ed)
** In that case, truncate SCRIPT_NAME so that it is a proper prefix
** of REQUEST_URI.
**
** SCGI typically omits PATH_INFO.  CGI sometimes omits REQUEST_URI and
** PATH_INFO when it is empty.
**
** CGI Parameter quick reference:
**
**                                      REQUEST_URI
**                               _____________|____________
**                              /                          \
**    https://www.fossil-scm.org/forum/info/12736b30c072551a?t=c
**            \________________/\____/\____________________/ \_/
**                    |            |             |            |
**               HTTP_HOST         |        PATH_INFO     QUERY_STRING
**                            SCRIPT_NAME
*/
void cgi_init(void){







>
>
>
>



















|
|







1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
** the QUERY_STRING environment variable (if it exists), from standard
** input if there is POST data, and from HTTP_COOKIE.
**
** REQUEST_URI, PATH_INFO, and SCRIPT_NAME are related as follows:
**
**      REQUEST_URI == SCRIPT_NAME + PATH_INFO
**
** Or if QUERY_STRING is not empty:
**
**      REQUEST_URI == SCRIPT_NAME + PATH_INFO + '?' + QUERY_STRING
**
** Where "+" means concatenate.  Fossil requires SCRIPT_NAME.  If
** REQUEST_URI is provided but PATH_INFO is not, then PATH_INFO is
** computed from REQUEST_URI and SCRIPT_NAME.  If PATH_INFO is provided
** but REQUEST_URI is not, then compute REQUEST_URI from PATH_INFO and
** SCRIPT_NAME.  If neither REQUEST_URI nor PATH_INFO are provided, then
** assume that PATH_INFO is an empty string and set REQUEST_URI equal
** to PATH_INFO.
**
** Sometimes PATH_INFO is missing and SCRIPT_NAME is not a prefix of
** REQUEST_URI.  (See https://fossil-scm.org/forum/forumpost/049e8650ed)
** In that case, truncate SCRIPT_NAME so that it is a proper prefix
** of REQUEST_URI.
**
** SCGI typically omits PATH_INFO.  CGI sometimes omits REQUEST_URI and
** PATH_INFO when it is empty.
**
** CGI Parameter quick reference:
**
**                                      REQUEST_URI
**                               _____________|________________
**                              /                              \
**    https://www.fossil-scm.org/forum/info/12736b30c072551a?t=c
**            \________________/\____/\____________________/ \_/
**                    |            |             |            |
**               HTTP_HOST         |        PATH_INFO     QUERY_STRING
**                            SCRIPT_NAME
*/
void cgi_init(void){
1224
1225
1226
1227
1228
1229
1230




1231

1232
1233
1234
1235
1236
1237
1238
  /* We must have SCRIPT_NAME. If the web server did not supply it, try
  ** to compute it from REQUEST_URI and PATH_INFO. */
  if( zScriptName==0 ){
    size_t nRU, nPI;
    if( zRequestUri==0 || zPathInfo==0 ){
      malformed_request("missing SCRIPT_NAME");  /* Does not return */
    }




    nRU = strlen(zRequestUri);

    nPI = strlen(zPathInfo);
    if( nRU<nPI ){
      malformed_request("PATH_INFO is longer than REQUEST_URI");
    }
    zScriptName = fossil_strndup(zRequestUri,(int)(nRU-nPI));
    cgi_set_parameter("SCRIPT_NAME", zScriptName);
  }







>
>
>
>
|
>







1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
  /* We must have SCRIPT_NAME. If the web server did not supply it, try
  ** to compute it from REQUEST_URI and PATH_INFO. */
  if( zScriptName==0 ){
    size_t nRU, nPI;
    if( zRequestUri==0 || zPathInfo==0 ){
      malformed_request("missing SCRIPT_NAME");  /* Does not return */
    }
    z = strchr(zRequestUri,'?');
    if( z ){
      nRU = (int)(z - zRequestUri);
    }else{
      nRU = strlen(zRequestUri);
    }
    nPI = strlen(zPathInfo);
    if( nRU<nPI ){
      malformed_request("PATH_INFO is longer than REQUEST_URI");
    }
    zScriptName = fossil_strndup(zRequestUri,(int)(nRU-nPI));
    cgi_set_parameter("SCRIPT_NAME", zScriptName);
  }
1249
1250
1251
1252
1253
1254
1255

1256
1257
1258
1259



1260

1261
1262
1263
1264
1265
1266
1267
    for(j=i; zPathInfo[j] && zPathInfo[j]!='?'; j++){}
    zPathInfo = fossil_strndup(zPathInfo+i, j-i);
    cgi_replace_parameter("PATH_INFO", zPathInfo);
  }
#endif
  if( zRequestUri==0 ){
    const char *z = zPathInfo;

    if( zPathInfo==0 ){
      malformed_request("missing PATH_INFO and/or REQUEST_URI");
    }
    if( z[0]=='/' ) z++;



    zRequestUri = mprintf("%s/%s", zScriptName, z);

    cgi_set_parameter("REQUEST_URI", zRequestUri);
  }
  if( zPathInfo==0 ){
    int i, j;
    for(i=0; zRequestUri[i]==zScriptName[i] && zRequestUri[i]; i++){}
    for(j=i; zRequestUri[j] && zRequestUri[j]!='?'; j++){}
    zPathInfo = fossil_strndup(zRequestUri+i, j-i);







>




>
>
>
|
>







1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
    for(j=i; zPathInfo[j] && zPathInfo[j]!='?'; j++){}
    zPathInfo = fossil_strndup(zPathInfo+i, j-i);
    cgi_replace_parameter("PATH_INFO", zPathInfo);
  }
#endif
  if( zRequestUri==0 ){
    const char *z = zPathInfo;
    const char *zQS = cgi_parameter("QUERY_STRING",0);
    if( zPathInfo==0 ){
      malformed_request("missing PATH_INFO and/or REQUEST_URI");
    }
    if( z[0]=='/' ) z++;
    if( zQS && zQS[0] ){
      zRequestUri = mprintf("%s/%s?%s", zScriptName, z, zQS);
    }else{
      zRequestUri = mprintf("%s/%s", zScriptName, z);
    }
    cgi_set_parameter("REQUEST_URI", zRequestUri);
  }
  if( zPathInfo==0 ){
    int i, j;
    for(i=0; zRequestUri[i]==zScriptName[i] && zRequestUri[i]; i++){}
    for(j=i; zRequestUri[j] && zRequestUri[j]!='?'; j++){}
    zPathInfo = fossil_strndup(zRequestUri+i, j-i);
1864
1865
1866
1867
1868
1869
1870

1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
  }
  cgi_setenv("GATEWAY_INTERFACE","CGI/1.0");
  cgi_setenv("REQUEST_METHOD",zToken);
  zToken = extract_token(z, &z);
  if( zToken==0 ){
    malformed_request("malformed URL in HTTP header");
  }

  cgi_setenv("SCRIPT_NAME", "");
  for(i=0; zToken[i] && zToken[i]!='?'; i++){}
  if( zToken[i] ) zToken[i++] = 0;
  cgi_setenv("REQUEST_URI", zToken);
  cgi_setenv("PATH_INFO", zToken);
  cgi_setenv("QUERY_STRING", &zToken[i]);
  if( zIpAddr==0 ){
    zIpAddr = cgi_remote_ip(fileno(g.httpIn));
  }
  if( zIpAddr ){
    cgi_setenv("REMOTE_ADDR", zIpAddr);







>



<







1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888

1889
1890
1891
1892
1893
1894
1895
  }
  cgi_setenv("GATEWAY_INTERFACE","CGI/1.0");
  cgi_setenv("REQUEST_METHOD",zToken);
  zToken = extract_token(z, &z);
  if( zToken==0 ){
    malformed_request("malformed URL in HTTP header");
  }
  cgi_setenv("REQUEST_URI", zToken);
  cgi_setenv("SCRIPT_NAME", "");
  for(i=0; zToken[i] && zToken[i]!='?'; i++){}
  if( zToken[i] ) zToken[i++] = 0;

  cgi_setenv("PATH_INFO", zToken);
  cgi_setenv("QUERY_STRING", &zToken[i]);
  if( zIpAddr==0 ){
    zIpAddr = cgi_remote_ip(fileno(g.httpIn));
  }
  if( zIpAddr ){
    cgi_setenv("REMOTE_ADDR", zIpAddr);