Java équivalent à encodeURIComponent de JavaScript qui produit une sortie identique?

Question 1

J'ai expérimenté divers bits de code Java pour essayer de trouver quelque chose qui encodera une chaîne contenant des guillemets, des espaces et des caractères Unicode "exotiques" et produira une sortie identique à la fonction encodURIComponent de JavaScript .

Ma chaîne de test de torture est: "A" B ± "

Si j'entre l'instruction JavaScript suivante dans Firebug:

encodeURIComponent('"A" B ± "');

—Ensuite, j'obtiens:

"%22A%22%20B%20%C2%B1%20%22"

Voici mon petit programme de test Java:

import java.io.UnsupportedEncodingException;
import java.net.URLEncoder;

public class EncodingTest
{
  public static void main(String[] args) throws UnsupportedEncodingException
  {
    String s = "\"A\" B ± \"";
    System.out.println("URLEncoder.encode returns "
      + URLEncoder.encode(s, "UTF-8"));

    System.out.println("getBytes returns "
      + new String(s.getBytes("UTF-8"), "ISO-8859-1"));
  }
}

—Ce programme génère:

URLEncoder.encode renvoie% 22A% 22 + B +% C2% B1 +% 22
getBytes renvoie "A" B ± "

Proche, mais pas de cigare! Quelle est la meilleure façon d'encoder une chaîne UTF-8 à l'aide de Java afin qu'elle produise la même sortie que celle de JavaScript encodeURIComponent?

EDIT: J'utilise Java 1.4 pour passer à Java 5 sous peu.

Question 2

En regardant les différences d'implémentation, je vois que:

MDC surencodeURIComponent() :

caractères littéraux (représentation regex): [-a-zA-Z0-9._*~'()!]

Documentation Java 1.5.0 surURLEncoder :

caractères littéraux (représentation regex): [-a-zA-Z0-9._*]
le caractère espace " "est converti en signe plus "+".

Donc, fondamentalement, pour obtenir le résultat souhaité, utilisez URLEncoder.encode(s, "UTF-8")puis effectuez un post-traitement:

remplacer toutes les occurrences de "+"par"%20"
remplacer toutes les occurrences de "%xx"représentation de l'un ou l'autre de [~'()!]retour à leurs homologues littéraux

Question 3

C'est le cours que j'ai créé à la fin:

import java.io.UnsupportedEncodingException;
import java.net.URLDecoder;
import java.net.URLEncoder;

/**
 * Utility class for JavaScript compatible UTF-8 encoding and decoding.
 * 
 * @see http://stackoverflow.com/questions/607176/java-equivalent-to-javascripts-encodeuricomponent-that-produces-identical-output
 * @author John Topley 
 */
public class EncodingUtil
{
  /**
   * Decodes the passed UTF-8 String using an algorithm that's compatible with
   * JavaScript's <code>decodeURIComponent</code> function. Returns
   * <code>null</code> if the String is <code>null</code>.
   *
   * @param s The UTF-8 encoded String to be decoded
   * @return the decoded String
   */
  public static String decodeURIComponent(String s)
  {
    if (s == null)
    {
      return null;
    }

    String result = null;

    try
    {
      result = URLDecoder.decode(s, "UTF-8");
    }

    // This exception should never occur.
    catch (UnsupportedEncodingException e)
    {
      result = s;  
    }

    return result;
  }

  /**
   * Encodes the passed String as UTF-8 using an algorithm that's compatible
   * with JavaScript's <code>encodeURIComponent</code> function. Returns
   * <code>null</code> if the String is <code>null</code>.
   * 
   * @param s The String to be encoded
   * @return the encoded String
   */
  public static String encodeURIComponent(String s)
  {
    String result = null;

    try
    {
      result = URLEncoder.encode(s, "UTF-8")
                         .replaceAll("\\+", "%20")
                         .replaceAll("\\%21", "!")
                         .replaceAll("\\%27", "'")
                         .replaceAll("\\%28", "(")
                         .replaceAll("\\%29", ")")
                         .replaceAll("\\%7E", "~");
    }

    // This exception should never occur.
    catch (UnsupportedEncodingException e)
    {
      result = s;
    }

    return result;
  }  

  /**
   * Private constructor to prevent this class from being instantiated.
   */
  private EncodingUtil()
  {
    super();
  }
}

Question 4

À l'aide du moteur javascript fourni avec Java 6:


import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;

public class Wow
{
    public static void main(String[] args) throws Exception
    {
        ScriptEngineManager factory = new ScriptEngineManager();
        ScriptEngine engine = factory.getEngineByName("JavaScript");
        engine.eval("print(encodeURIComponent('\"A\" B ± \"'))");
    }
}

Sortie:% 22A% 22% 20B% 20% c2% b1% 20% 22

Le cas est différent mais il est plus proche de ce que vous voulez.

Question 5

J'utilise java.net.URI#getRawPath(), par exemple

String s = "a+b c.html";
String fixed = new URI(null, null, s, null).getRawPath();

La valeur de fixedsera a+b%20c.html, ce que vous voulez.

Le post-traitement de la sortie de URLEncoder.encode()supprimera tous les avantages supposés être dans l'URI. Par exemple

URLEncoder.encode("a+b c.html").replaceAll("\\+", "%20");

vous donnera a%20b%20c.html, ce qui sera interprété comme a b c.html.

Question 6

J'ai créé ma propre version de l'encodeURIComponent, car la solution publiée a un problème, s'il y avait un + présent dans la chaîne, qui devrait être encodée, elle sera convertie en espace.

Alors voici ma classe:

import java.io.UnsupportedEncodingException;
import java.util.BitSet;

public final class EscapeUtils
{
    /** used for the encodeURIComponent function */
    private static final BitSet dontNeedEncoding;

    static
    {
        dontNeedEncoding = new BitSet(256);

        // a-z
        for (int i = 97; i <= 122; ++i)
        {
            dontNeedEncoding.set(i);
        }
        // A-Z
        for (int i = 65; i <= 90; ++i)
        {
            dontNeedEncoding.set(i);
        }
        // 0-9
        for (int i = 48; i <= 57; ++i)
        {
            dontNeedEncoding.set(i);
        }

        // '()*
        for (int i = 39; i <= 42; ++i)
        {
            dontNeedEncoding.set(i);
        }
        dontNeedEncoding.set(33); // !
        dontNeedEncoding.set(45); // -
        dontNeedEncoding.set(46); // .
        dontNeedEncoding.set(95); // _
        dontNeedEncoding.set(126); // ~
    }

    /**
     * A Utility class should not be instantiated.
     */
    private EscapeUtils()
    {

    }

    /**
     * Escapes all characters except the following: alphabetic, decimal digits, - _ . ! ~ * ' ( )
     * 
     * @param input
     *            A component of a URI
     * @return the escaped URI component
     */
    public static String encodeURIComponent(String input)
    {
        if (input == null)
        {
            return input;
        }

        StringBuilder filtered = new StringBuilder(input.length());
        char c;
        for (int i = 0; i < input.length(); ++i)
        {
            c = input.charAt(i);
            if (dontNeedEncoding.get(c))
            {
                filtered.append(c);
            }
            else
            {
                final byte[] b = charToBytesUTF(c);

                for (int j = 0; j < b.length; ++j)
                {
                    filtered.append('%');
                    filtered.append("0123456789ABCDEF".charAt(b[j] >> 4 & 0xF));
                    filtered.append("0123456789ABCDEF".charAt(b[j] & 0xF));
                }
            }
        }
        return filtered.toString();
    }

    private static byte[] charToBytesUTF(char c)
    {
        try
        {
            return new String(new char[] { c }).getBytes("UTF-8");
        }
        catch (UnsupportedEncodingException e)
        {
            return new byte[] { (byte) c };
        }
    }
}

Question 7

J'ai trouvé une autre implémentation documentée à l' adresse , http://blog.sangupta.com/2010/05/encodeuricomponent-and.html . L'implémentation peut également gérer les octets Unicode.

Question 8

J'ai utilisé avec succès la classe java.net.URI comme ceci:

public static String uriEncode(String string) {
    String result = string;
    if (null != string) {
        try {
            String scheme = null;
            String ssp = string;
            int es = string.indexOf(':');
            if (es > 0) {
                scheme = string.substring(0, es);
                ssp = string.substring(es + 1);
            }
            result = (new URI(scheme, ssp, null)).toString();
        } catch (URISyntaxException usex) {
            // ignore and use string that has syntax error
        }
    }
    return result;
}

Question 9

Voici un exemple simple de la solution de Ravi Wallau:

public String buildSafeURL(String partialURL, String documentName)
        throws ScriptException {
    ScriptEngineManager scriptEngineManager = new ScriptEngineManager();
    ScriptEngine scriptEngine = scriptEngineManager
            .getEngineByName("JavaScript");

    String urlSafeDocumentName = String.valueOf(scriptEngine
            .eval("encodeURIComponent('" + documentName + "')"));
    String safeURL = partialURL + urlSafeDocumentName;

    return safeURL;
}

public static void main(String[] args) {
    EncodeURIComponentDemo demo = new EncodeURIComponentDemo();
    String partialURL = "https://www.website.com/document/";
    String documentName = "Tom & Jerry Manuscript.pdf";

    try {
        System.out.println(demo.buildSafeURL(partialURL, documentName));
    } catch (ScriptException se) {
        se.printStackTrace();
    }
}

Production: https://www.website.com/document/Tom%20%26%20Jerry%20Manuscript.pdf

Il répond également à la question en suspens dans les commentaires de Loren Shqipognja sur la façon de transmettre une variable String à encodeURIComponent(). La méthode scriptEngine.eval()renvoie un Object, afin qu'elle puisse être convertie en String via String.valueOf()entre autres méthodes.

Question 10

pour moi, cela a fonctionné:

import org.apache.http.client.utils.URIBuilder;

String encodedString = new URIBuilder()
  .setParameter("i", stringToEncode)
  .build()
  .getRawQuery() // output: i=encodedString
  .substring(2);

ou avec un autre UriBuilder

import javax.ws.rs.core.UriBuilder;

String encodedString = UriBuilder.fromPath("")
  .queryParam("i", stringToEncode)
  .toString()   // output: ?i=encodedString
  .substring(3);

À mon avis, l'utilisation d'une bibliothèque standard est une meilleure idée que le post-traitement manuel. La réponse @Chris avait également l'air bien, mais elle ne fonctionne pas pour les URL, comme " http: // a + b c.html"

Question 11

Voici ce que j'utilise:

private static final String HEX = "0123456789ABCDEF";

public static String encodeURIComponent(String str) {
    if (str == null) return null;

    byte[] bytes = str.getBytes(StandardCharsets.UTF_8);
    StringBuilder builder = new StringBuilder(bytes.length);

    for (byte c : bytes) {
        if (c >= 'a' ? c <= 'z' || c == '~' :
            c >= 'A' ? c <= 'Z' || c == '_' :
            c >= '0' ? c <= '9' :  c == '-' || c == '.')
            builder.append((char)c);
        else
            builder.append('%')
                   .append(HEX.charAt(c >> 4 & 0xf))
                   .append(HEX.charAt(c & 0xf));
    }

    return builder.toString();
}

Il va au-delà du Javascript en encodant en pourcentage chaque caractère qui n'est pas un caractère non réservé selon la RFC 3986 .

Voici la conversion opposée:

public static String decodeURIComponent(String str) {
    if (str == null) return null;

    int length = str.length();
    byte[] bytes = new byte[length / 3];
    StringBuilder builder = new StringBuilder(length);

    for (int i = 0; i < length; ) {
        char c = str.charAt(i);
        if (c != '%') {
            builder.append(c);
            i += 1;
        } else {
            int j = 0;
            do {
                char h = str.charAt(i + 1);
                char l = str.charAt(i + 2);
                i += 3;

                h -= '0';
                if (h >= 10) {
                    h |= ' ';
                    h -= 'a' - '0';
                    if (h >= 6) throw new IllegalArgumentException();
                    h += 10;
                }

                l -= '0';
                if (l >= 10) {
                    l |= ' ';
                    l -= 'a' - '0';
                    if (l >= 6) throw new IllegalArgumentException();
                    l += 10;
                }

                bytes[j++] = (byte)(h << 4 | l);
                if (i >= length) break;
                c = str.charAt(i);
            } while (c == '%');
            builder.append(new String(bytes, 0, j, UTF_8));
        }
    }

    return builder.toString();
}

Question 12

J'ai trouvé la classe PercentEscaper de la bibliothèque google-http-java-client, qui peut être utilisée pour implémenter assez facilement encodeURIComponent.

PercentEscaper de google-http-java-client javadoc google-http-java-client home

Question 13

La bibliothèque Guava a PercentEscaper:

Escaper percentEscaper = new PercentEscaper("-_.*", false);

"-_. *" sont des caractères sûrs

false indique PercentEscaper pour échapper à l'espace avec '% 20', pas '+'

Question 14

J'avais l'habitude String encodedUrl = new URI(null, url, null).toASCIIString(); d'encoder des URL. Pour ajouter des paramètres après ceux existants dans le urlI useUriComponentsBuilder